I have two columns:
date age
0 2016-01-05 47.0
1 2016-01-05 43.0
2 2016-01-05 28.0
3 2016-01-05 46.0
4 2016-01-04 39.0
What I want is another column with the difference between the date and age:
date age dob
0 2016-01-05 47.0 1969-01-05
1 2016-01-05 43.0 1973-01-05
2 2016-01-05 28.0 1988-01-05
3 2016-01-05 46.0 1970-01-05
4 2016-01-04 39.0 1977-01-04
Seems simple enough, but the simple df['date'] - df['age'].astype('timedelta64[Y]')
gives:
0 1969-01-04 14:27:36
1 1973-01-04 13:44:24
2 1988-01-05 05:02:24
3 1970-01-04 20:16:48
4 1977-01-03 13:01:12
Why the additional time stamp? Even pd.to_timedelta(df['age'], unit='Y')
gives the same result, with an additional warning that unit='Y'
is deprecated.
Further, df['date'] - pd.DateOffset(years=df['age'])
throws (understandably):
TypeError: cannot convert the series to <class 'int'>
I can use apply
in the second option, df['date'] - df['age'].apply(lambda a: pd.DateOffset(years=a))
, to circuitously get the correct result, and (understandably) PerformanceWarning: Adding/subtracting array of DateOffsets to DatetimeArray not vectorized
.
What is a good (pythonic and vectorized) solution here?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…