I have a data file, apples.csv, that has headers like:
"id","str1","str2","str3","num1","num2"
I read it into a dataframe with pandas:
apples = pd.read_csv('apples.csv',delimiter=",",sep=r"s+")
Then I do some stuff to it, but ignore that (I have it all commented out, and my overall issues still occurs, so said stuff is irrelevant here).
I then save it out:
apples.to_csv('bananas.csv',columns=["id","str1","str2","str3","num1","num2"])
Now, looking at bananas.csv, its headers are:
,id,str1,str2,str3,num1,num2
No more quotes (which I don't really care about, as it doesn't impact anything in the file), and then that leading comma.
The ensuing rows are now with an additional column in there, so it saves out 7 columns. But if I do:
print(len(apples.columns))
Immediately prior to saving, it shows 6 columns...
I am normally in Java/Perl/R, and less experienced with Python and particularly Pandas, so I am not sure if this is "yeah, it just does that" or what the issue is - but I have spent amusingly long trying to figure this out and cannot find it via searching.
How can I get it to not do that prepending of a comma, and maybe as important - why is it doing it?
See Question&Answers more detail:
os