Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
255 views
in Technique[技术] by (71.8m points)

python - Import csv with inconsistent count of columns per row with original header use pandas

please how can I read csv of that type and keep original columns names? Maybe add some generic column names to the end of the header, depending on the max number of columns in the body of csv...

a,b,c
1,2,3
1,2,3,
1,2,3,4

Simple read_csv does not work:

tempfile = pd.read_csv(path 
                 ,index_col=None
                 ,sep=','
                 ,header=0
                 ,error_bad_lines=False
                 ,encoding = 'unicode_escape'
                 ,warn_bad_lines=True
                 )
b'Skipping line 3: expected 3 fields, saw 4
Skipping line 4: expected 3 fields, saw 4
'

I need that type of result:

a,b,c,x1
1,2,3,NA
1,2,3,NA
1,2,3,4
question from:https://stackoverflow.com/questions/65911172/import-csv-with-inconsistent-count-of-columns-per-row-with-original-header-use-p

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One approach would be to first read just the header row in and then pass these column names with your extra generic names as a parameter to pandas. For example:

import pandas as pd
import csv

filename = "input.csv"

with open(filename, newline="") as f_input:
    header = next(csv.reader(f_input))

header += [f'x{n}' for n in range(1, 10)]

tempfile = pd.read_csv(filename,
                 index_col=None,
                 sep=',',
                 skiprows=1,
                 names=header,
                 error_bad_lines=False,
                 encoding='unicode_escape',
                 warn_bad_lines=True,
                 )

skiprows=1 tells pandas to jump over the header and names holds the full list of column headers to use.

The header would then contain:

['a', 'b', 'c', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...