I am trying to make a .csv file in a format that is both minimally human-readable and also easily pandas-readable. That means columns should be neatly separated so you can easily identify to which column each value belongs. Problem is, filling it up with whitespaces has some cut-downs in pandas functionality. So far what I've got is
work ,roughness ,unstab ,corr_c_w ,u_star ,c_star
us ,True ,True ,-0.39 ,0.35 ,-.99
wang ,False , ,-0.5 , ,
cheng , ,True , , ,
watanabe, , , ,0.15 ,-.80
If I take out all the whitespaces on the above .csv and read it directly with pd.read_csv
it works perfectly. The first two columns are booleans and the others are floats. However, it is not human-readable at all without the whitespaces. When I read the above .csv with
pd.read_csv('bibrev.csv', index_col=0)
it doesn't work because all the columns and considered string that include, obviously, the whitespaces. When I use
pd.read_csv('bibrev.csv', index_col=0, skipinitialspace=True)
then it kind of works, because floats are read as floats and missing values are read as NaN
s, which is a big improvement. However, the column names and boolean columns are still strings with whitespaces.
Any method of reading that .csv directly with pandas? Or maybe chance the csv format a bit and still have a clean-read with a human-readable .csv?
PS.: I am trying to avoid reading everything with python as a string, replacing whitespaces and then feeding it to pandas and also trying to avoid defining some functions and passing it to pandas through the converters
keyword.
See Question&Answers more detail:
os