I have an input file where every value is stored as a string.
It is inside a csv file with each entry inside double quotes.
Example file:
"column1","column2", "column3", "column4", "column5", "column6"
"AM", "07", "1", "SD", "SD", "CR"
"AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD"
"AM", "01", "2", "SD", "SD", "SD"
There are only six columns. What options do I need to enter to pandas read_csv to read this correctly?
I currently am trying:
import pandas as pd
df = pd.read_csv(file, quotechar='"')
but this gives me the error message:
CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 14
Which obviously means that it is ignoring the '"' and parsing every comma as a field.
However, for line 3, columns 3 through 6 should be strings with commas in them. ("1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD")
How do I get pandas.read_csv to parse this correctly?
Thanks.
See Question&Answers more detail:
os