I have a file.csv
with ~15k rows that looks like this
SAMPLE_TIME, POS, OFF, HISTOGRAM
2015-07-15 16:41:56, 0-0-0-0-3, 1, 2,0,5,59,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,
2015-07-15 16:42:55, 0-0-0-0-3, 1, 0,0,5,9,0,0,0,0,0,2,0,0,0,50,0,
2015-07-15 16:43:55, 0-0-0-0-3, 1, 0,0,5,5,0,0,0,0,0,2,0,0,0,0,4,0,0,0,
2015-07-15 16:44:56, 0-0-0-0-3, 1, 2,0,5,0,0,0,0,0,0,2,0,0,0,6,0,0,0,0
I wanted it to be imported to pandas.DataFrame
with any random value given to the column that don't have a header, something like this:
SAMPLE_TIME, POS, OFF, HISTOGRAM 1 2 3 4 5 6
2015-07-15 16:41:56, 0-0-0-0-3, 1, 2, 0, 5, 59, 4, 0, 0,
2015-07-15 16:42:55, 0-0-0-0-3, 1, 0, 0, 5, 0, 6, 0, nan
2015-07-15 16:43:55, 0-0-0-0-3, 1, 0, 0, 5, 0, 7, nan nan
2015-07-15 16:44:56, 0-0-0-0-3, 1, 2, 0, 5, 0, 0, 2, nan
This has been impossible to import, as i tried different solution, such as giving a specific a header, But still no joy, the only way i was able to make it work is to add a header manually in the .csv
file. which kinda defeat the purpose of automation!
Then i tried this solution:
Doing this
lines=list(csv.reader(open('file.csv')))
header, values = lines[0], lines[1:]
it correctly reads the files giving me a list of ~15k element values
, each element is a list of string, where each string is correctly parsed data field from the file, but when i try to do this:
data = {h:v for h,v in zip (header, zip(*values))}
df = pd.DataFrame.from_dict(data)
or this:
data2 = {h:v for h,v in zip (str(xrange(16)), zip(*values))}
df2 = pd.DataFrame.from_dict(data)
Then the non headered columns disappear and the order of columns is completely mixed. any idea of a possible solution ?
See Question&Answers more detail:
os