The error shows that the machine does not have enough memory to read the entire
CSV into a DataFrame at one time. Assuming you do not need the entire dataset in
memory all at one time, one way to avoid the problem would be to process the CSV in
chunks (by specifying the chunksize
parameter):
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
The chunksize
parameter specifies the number of rows per chunk.
(The last chunk may contain fewer than chunksize
rows, of course.)
pandas >= 1.2
read_csv
with chunksize
returns a context manager, to be used like so:
chunksize = 10 ** 6
with pd.read_csv(filename, chunksize=chunksize) as reader:
for chunk in reader:
process(chunk)
See GH38225
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…