Python: Read multiple large csv's at the same time

Question

Welcome To Ask or Share your Answers For Others

Python: Read multiple large csv's at the same time

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python: Read multiple large csv's at the same time

I have 9 large CSVs (12GB each), with exactly the same column structure and row order, just different values in each csv. I need to go through the csv's row by row and compare the data inside them, but they are far too large to store in memory. Row order being maintained is highly important as the row position is used as an index for comparing the data between csvs, so appending the tables together isn't ideal.

I'd rather avoid 9 nested "with open() as csv:" using DictReader and this seems very messy.

I've tried to used pandas and concatenate:

files = [list_of_csv_paths]
result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)

but it simply tries to load all the data into memory and I don't have nearly enough RAM. Changing the pd.read_csv to have a specific chucksize returns a TypeError.

I've seen that possibly Dash could be used for this but I'm not experienced with Dash.

I'm open to any suggestions.

question from:https://stackoverflow.com/questions/65888056/python-read-multiple-large-csvs-at-the-same-time

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:19:34+0000

I think this might be a good start - reading by chunks - where chunksize is number of lines by documentation. That should be the best way of reading huge files. You can try to use threading as well to process it faster.

Simple example:

import pandas as pd
chunksize = 10 ** 8
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

Check the skiprows parameter as well. Next example is gonna read lines from 1000 to 2000.

Example:

df = pd.read_csv('file.csv',sep=',', header=None, skiprows=1000, chunksize=1000)

Categories

Python: Read multiple large csv's at the same time

Python: Read multiple large csv's at the same time

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags