python - Split large text file(around 50GB) into multiple files

Question

Welcome To Ask or Share your Answers For Others

python - Split large text file(around 50GB) into multiple files

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Split large text file(around 50GB) into multiple files

I would like to split a large text file around size of 50GB into multiple files. Data in the files are like this-[x= any integer between 0-9]

xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
xxx.xxx.xxx.xxx
...............
...............

There might be few billions of lines in the file and i would like write for example 30/40 millions per file. I guess the steps would be-

I've to open the file
then using readline() have to read the file line by line and write at the same time to a new file
and as soon as it hits the maximum number of lines it will create another file and starts writing again.

I'm wondering, how to put all these steps together in a memory efficient and faster way. I've seen some examples in stack but none of them totally helping what i exactly need. I would really appreciate if anyone could help me out.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:06:52+0000

This working solution uses split command available in shell. Since the author has already accepted a possibility of a non-python solution, please do not downvote.

First, I created a test file with 1000M entries (15 GB) with

awk 'BEGIN{for (i = 0; i < 1000000000; i++) {print "123.123.123.123"} }' > t.txt

Then I used split:

split --lines=30000000 --numeric-suffixes --suffix-length=2 t.txt t

It took 5 min to produce a set of 34 small files with names t00-t33. 33 files are 458 MB each and the last t33 is 153 MB.

Categories

python - Split large text file(around 50GB) into multiple files

python - Split large text file(around 50GB) into multiple files

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags