I have 9 large CSVs (12GB each), with exactly the same column structure and row order, just different values in each csv.
I need to go through the csv's row by row and compare the data inside them, but they are far too large to store in memory.
Row order being maintained is highly important as the row position is used as an index for comparing the data between csvs, so appending the tables together isn't ideal.
I'd rather avoid 9 nested "with open() as csv:" using DictReader and this seems very messy.
I've tried to used pandas and concatenate:
files = [list_of_csv_paths]
result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)
but it simply tries to load all the data into memory and I don't have nearly enough RAM.
Changing the pd.read_csv to have a specific chucksize returns a TypeError.
I've seen that possibly Dash could be used for this but I'm not experienced with Dash.
I'm open to any suggestions.
question from:
https://stackoverflow.com/questions/65888056/python-read-multiple-large-csvs-at-the-same-time 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…