Is there a built-in way to use read_csv
to read only the first n
lines of a file without knowing the length of the lines ahead of time? I have a large file that takes a long time to read, and occasionally only want to use the first, say, 20 lines to get a sample of it (and prefer not to load the full thing and take the head of it).
If I knew the total number of lines I could do something like footer_lines = total_lines - n
and pass this to the skipfooter
keyword arg. My current solution is to manually grab the first n
lines with python and StringIO it to pandas:
import pandas as pd
from StringIO import StringIO
n = 20
with open('big_file.csv', 'r') as f:
head = ''.join(f.readlines(n))
df = pd.read_csv(StringIO(head))
It's not that bad, but is there a more concise, 'pandasic' (?) way to do it with keywords or something?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…