I'm using read_csv
to read CSV files into Pandas data frames. My CSV files contain large numbers of decimals/floats. The numbers are encoded using the European decimal notation:
1.234.456,78
This means that the '.' is used as the thousand separator and the ',' is the decimal mark.
Pandas 0.8. provides a read_csv
argument called 'thousands' to set the thousand separator. Is there an additional argument to provide the decimal mark as well? If no, what is the most efficient way to parse a European style decimal number?
Currently I'm using string replace which I consider to be a significant performance penalty. The coding I'm using is this:
# Convert to float data type and change decimal point from ',' to '.'
f = lambda x: string.replace(x, u',', u'.')
df['MyColumn'] = df['MyColumn'].map(f)
Any help is appreciated.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…