I am trying to convert one column of my dataframe to datetime. Following the discussion here https://github.com/dask/dask/issues/863 I tried the following code:
import dask.dataframe as dd
df['time'].map_partitions(pd.to_datetime, columns='time').compute()
But I am getting the following error message
ValueError: Metadata inference failed, please provide `meta` keyword
What exactly should I put under meta? should I put a dictionary of ALL the columns in df or only of the 'time' column? and what type should I put? I have tried dtype and datetime64 but none of them work so far.
Thank you and I appreciate your guidance,
Update
I will include here the new error messages:
1) Using Timestamp
df['trd_exctn_dt'].map_partitions(pd.Timestamp).compute()
TypeError: Cannot convert input to Timestamp
2) Using datetime and meta
meta = ('time', pd.Timestamp)
df['time'].map_partitions(pd.to_datetime,meta=meta).compute()
TypeError: to_datetime() got an unexpected keyword argument 'meta'
3) Just using date time: gets stuck at 2%
In [14]: df['trd_exctn_dt'].map_partitions(pd.to_datetime).compute()
[ ] | 2% Completed | 2min 20.3s
Also, I would like to be able to specify the format in the date, as i would do in pandas:
pd.to_datetime(df['time'], format = '%m%d%Y'
Update 2
After updating to Dask 0.11, I no longer have problems with the meta keyword. Still, I can't get it past 2% on a 2GB dataframe.
df['trd_exctn_dt'].map_partitions(pd.to_datetime, meta=meta).compute()
[ ] | 2% Completed | 30min 45.7s
Update 3
worked better this way:
def parse_dates(df):
return pd.to_datetime(df['time'], format = '%m/%d/%Y')
df.map_partitions(parse_dates, meta=meta)
I'm not sure whether it's the right approach or not
See Question&Answers more detail:
os