python - Writing large Pandas Dataframes to CSV file in chunks

Question

Welcome To Ask or Share your Answers For Others

python - Writing large Pandas Dataframes to CSV file in chunks

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Writing large Pandas Dataframes to CSV file in chunks

How do I write out a large data file to a CSV file in chunks?

I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of that data is of interest to me.

I want to make things easier by making copies of these files with only the columns of interest so I have smaller files to work with for post-processing. So I plan to read the file into a dataframe, then write to csv file.

I've been looking into reading large data files in chunks into a dataframe. However, I haven't been able to find anything on how to write out the data to a csv file in chunks.

Here is what I'm trying now, but this doesn't append the csv file:

with open(os.path.join(folder, filename), 'r') as src:
    df = pd.read_csv(src, sep='',skiprows=(0,1,2),header=(0), chunksize=1000)
    for chunk in df:
        chunk.to_csv(os.path.join(folder, new_folder,
                                  "new_file_" + filename), 
                                  columns = [['TIME','STUFF']])

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:34:47+0000

Solution:

header = True
for chunk in chunks:

    chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename),
        header=header, cols=[['TIME','STUFF']], mode='a')

    header = False

Notes:

The mode='a' tells pandas to append.
We only write a column header on the first chunk.

Categories

python - Writing large Pandas Dataframes to CSV file in chunks

python - Writing large Pandas Dataframes to CSV file in chunks

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags