Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
384 views
in Technique[技术] by (71.8m points)

python - zip() alternative for iterating through two iterables

I have two large (~100 GB) text files that must be iterated through simultaneously.

Zip works well for smaller files but I found out that it's actually making a list of lines from my two files. This means that every line gets stored in memory. I don't need to do anything with the lines more than once.

handle1 = open('filea', 'r'); handle2 = open('fileb', 'r')

for i, j in zip(handle1, handle2):
    do something with i and j.
    write to an output file.
    no need to do anything with i and j after this.

Is there an alternative to zip() that acts as a generator that will allow me to iterate through these two files without using >200GB of ram?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

itertools has a function izip that does that

from itertools import izip
for i, j in izip(handle1, handle2):
    ...

If the files are of different sizes you may use izip_longest, as izip will stop at the smaller file.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...