You could read the csv in chunks. Since pd.read_csv
will return an iterator when the chunksize
parameter is specified, you can use itertools.takewhile
to read only as many chunks as you need, without reading the whole file.
import itertools as IT
import pandas as pd
chunksize = 10 ** 5
chunks = pd.read_csv(filename, chunksize=chunksize, header=None)
chunks = IT.takewhile(lambda chunk: chunk['B'].iloc[-1] < 10, chunks)
df = pd.concat(chunks)
mask = df['B'] < 10
df = df.loc[mask]
Or, to avoid having to use df.loc[mask]
to remove unwanted rows from the last chunk, perhaps a cleaner solution would be to define a custom generator:
import itertools as IT
import pandas as pd
def valid(chunks):
for chunk in chunks:
mask = chunk['B'] < 10
if mask.all():
yield chunk
else:
yield chunk.loc[mask]
break
chunksize = 10 ** 5
chunks = pd.read_csv(filename, chunksize=chunksize, header=None)
df = pd.concat(valid(chunks))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…