Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
294 views
in Technique[技术] by (71.8m points)

python - Read CSV into a dataFrame with varying row lengths using Pandas

So I have a CSV that looks a bit like this:

1 | 01-01-2019 | 724
2 | 01-01-2019 | 233 | 436
3 | 01-01-2019 | 345
4 | 01-01-2019 | 803 | 933 | 943 | 923 | 954
5 | 01-01-2019 | 454
...

And when I try to use the following code to generate a dataFrame..

df = pd.read_csv('data.csv', header=0, engine='c', error_bad_lines=False)

It only adds rows with 3 columns to the df (rows 1, 3 and 5 from above)

The rest are considered 'bad lines' giving me the following error:

Skipping line 17467: expected 3 fields, saw 9

How do I create a data frame that includes all data in my csv, possibly just filling in the empty cells with null? Or do I have to declare the max row length prior to adding to the df?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If using only pandas, read in lines, deal with the separator after.

import pandas as pd

df = pd.read_csv('data.csv', header=None, sep='
')
df = df[0].str.split('s|s', expand=True)

   0           1    2     3     4     5     6
0  1  01-01-2019  724  None  None  None  None
1  2  01-01-2019  233   436  None  None  None
2  3  01-01-2019  345  None  None  None  None
3  4  01-01-2019  803   933   943   923   954
4  5  01-01-2019  454  None  None  None  None

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...