If you know what the header startswith:
def skip_to(fle, line,**kwargs):
if os.stat(fle).st_size == 0:
raise ValueError("File is empty")
with open(fle) as f:
pos = 0
cur_line = f.readline()
while not cur_line.startswith(line):
pos = f.tell()
cur_line = f.readline()
f.seek(pos)
return pd.read_csv(f, **kwargs)
Demo:
In [18]: cat test.txt
1,2
3,4
The,header
foo,bar
foobar,foo
In [19]: df = skip_to("test.txt","The,header", sep=",")
In [20]: df
Out[20]:
The header
0 foo bar
1 foobar foo
By calling .tell
we keep track of where the pointer is for the previous line so when we hit the header we seek back to that line and just pass the file object to pandas.
Or using the junk if they all started with something in common:
def skip_to(fle, junk,**kwargs):
if os.stat(fle).st_size == 0:
raise ValueError("File is empty")
with open(fle) as f:
pos = 0
cur_line = f.readline()
while cur_line.startswith(junk):
pos = f.tell()
cur_line = f.readline()
f.seek(pos)
return pd.read_csv(f, **kwargs)
df = skip_to("test.txt", "junk",sep="")
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…