I have a .csv with the following structure:
date_begin,date_end,name,name_code,active_accounts,transaction_amount,transaction_count
1/1/2008,1/31/2008,Name_1,1001,"123,456","$7,890,123.45","67,890"
2/1/2008,2/29/2008,Name_1,1001,"43,210","$987,654.32","109,876"
3/1/2008,3/31/2008,Name_1,1001,"485,079","$1,265,789,433.98","777,888"
...
12/1/2008,12/31/2008,Name_1,1001,"87,543","$432,098,987","87,987"
1/1/2008,1/31/2008,Name_2,1002,"268,456","$890,123.45","97,890"
2/1/2008,2/29/2008,Name_2,1002,"53,210","$987,654.32","109,876"
...
etc
I am trying to read them into into a pandas dataframe by using the following code:
import pandas as pd
data = pd.read_csv('my_awesome_csv.csv'),parse_dates=[[0,1]],
infer_datetime_format=True)
This works just fine except that I would like to control the data types in each column. When I run the following code in the interpreter I discover that the numbers in quotes do not get recognized as numbers, either dollars or otherwise.
In [10]: data.dtypes
Out[10]:
date_begin_date_end object
name object
name_code int64
active_accounts object # Problem, I want this to be a number
transaction_amount object # Ditto, I want this to be a number (it's a dollar amount)
transaction_count object # Still a number!
dtype: object
I have done some snooping around in the Pandas csv documentation but haven't found what I'm looking for about declaring types that are amounts when they are saved as strings with commas and dollar signs in the csv. My ultimate goal here is to be able to do some arithmetic operations on the values in these columns.
Any thoughts?
See Question&Answers more detail:
os