I have two large (~100 GB) text files that must be iterated through simultaneously.
Zip works well for smaller files but I found out that it's actually making a list of lines from my two files. This means that every line gets stored in memory. I don't need to do anything with the lines more than once.
handle1 = open('filea', 'r'); handle2 = open('fileb', 'r')
for i, j in zip(handle1, handle2):
do something with i and j.
write to an output file.
no need to do anything with i and j after this.
Is there an alternative to zip() that acts as a generator that will allow me to iterate through these two files without using >200GB of ram?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…