I have strings like '03-21-2019'
that I want to convert to the native Python datetime object: that is, of the datetime.datetime
type. The conversion is easy enough through pandas
:
import pandas as pd
import datetime as dt
date_str = '03-21-2019'
pd_Timestamp = pd.to_datetime(date_str)
py_datetime_object = pd_Timestamp.to_pydatetime()
print(type(py_datetime_object))
with the result
<class 'datetime.datetime'>
This is precisely what I want, since I want to compute timedelta
's by subtracting one of these from another - perfectly well-defined in the native Python datetime.datetime
class. However, my data is in a pd.DataFrame
. When I try the following code:
import pandas as pd
import datetime as dt
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(lambda x:
pd.to_datetime(x).to_pydatetime())
print(type(df['Date'].iloc[0]))
the result is
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
This is the WRONG type, and I can't for the life of me figure out why only part of the lambda
expression is getting evaluated (that is, string-to-pandas-Timestamp), and not the last part (that is, pandas-Timestamp-to-datetime.datetime). It doesn't work if I define the function explicitly, either, instead of using a lambda
expression:
import pandas as pd
import datetime as dt
def to_native_datetime(date_str: str) -> dt.datetime:
return pd.to_datetime(date_str).to_pydatetime()
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))
The result is the same as before. It's definitely doing part of the function, as the result is not a string anymore. But I want the native Python datetime.datetime
object, and I see no way of getting it. This looks like a bug in pandas
, but I'm certainly willing to see it as user error on my part.
Why can't I get the native datetime.datetime
object out of a pandas.DataFrame
string column?
I have looked at this thread and this one, but neither of them answer my question.
[EDIT]: Here's something even more bizarre:
import pandas as pd
import datetime as dt
def to_native_datetime(date_str: str) -> dt.datetime:
return dt.datetime.strptime(date_str, '%m-%d-%Y')
df = pd.DataFrame(columns=['Date'])
df.loc[0] = ['03-21-2019']
df['Date'] = df['Date'].apply(to_native_datetime)
print(type(df['Date'].iloc[0]))
Here I'm not even using pandas
to convert the string, and I STILL get a
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
out of it!
Many thanks for your time!
[FURTHER EDIT]: Apparently, in this thread, in Nehal J Wani's answer, it comes out that pandas
automatically converts back to its native datetime format when you assign into a pd.DataFrame
. This is not what I wanted to hear, but apparently, I'm going to have to convert on-the-fly when I read out of the pd.DataFrame
.
See Question&Answers more detail:
os