If you are landing here because you were searching for a vectorized, fast and correct solution to the problem of adding a variable number of months to a Series
of Timestamps
, then read on.
In some problems, we do indeed want to add actual months (the way pd.offsets.DateOffset(months=x)
works), i.e.: 2021-01-31 + 1 month --> 2021-02-28
, and not just "30 days". But trying to use pd.offsets.DateOffset
directly raises a warning (PerformanceWarning: Adding/subtracting object-dtype array to DatetimeArray not vectorized
). For example:
dates + df['months'].apply(lambda m: pd.offsets.DateOffset(months=m))
dates + months * pd.offsets.DateOffset(months=1)
, which additionally is wrong in some cases (e.g. 2015-07-29 + 59 months
should be 2020-06-29
, not 2020-06-28
).
Instead, we can do a bit of arithmetic ourselves and get a vectorized solution:
# note: not timezone-aware
def vadd_months(dates, months):
ddt = dates.dt
m = ddt.month - 1 + months
mb = pd.to_datetime(pd.DataFrame({
'year': ddt.year + m // 12,
'month': (m % 12) + 1,
'day': 1})) + (dates - dates.dt.normalize())
me = mb + pd.offsets.MonthEnd()
r = mb + (ddt.day - 1) * pd.Timedelta(days=1)
r = np.minimum(r, me)
return r
Usage for the OP example
df['new_timestamp'] = vadd_months(df['timestamp'], df['delta_in_months'])
Speed
n = int(100_000)
df = pd.DataFrame({
'timestamp': pd.Series(pd.to_datetime(np.random.randint(
pd.Timestamp('2000').value,
pd.Timestamp('2020').value,
n
))).dt.floor('1s'),
'months': np.random.randint(0, 120, n),
})
%%time
newts = vadd_months(df['timestamp'], df['months'])
# CPU times: user 52.3 ms, sys: 4.01 ms, total: 56.3 ms
Verification
Check with the (non-vectorized) direct use of pd.offsets.DateOffset
:
import warnings
%%time
with warnings.catch_warnings():
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
check = df['timestamp'] + df['months'].apply(lambda m: pd.offsets.DateOffset(months=m))
# CPU times: user 2.41 s, sys: 43.9 ms, total: 2.45 s
>>> newts.equals(check)
True
Note that vadd_months
is 40x faster than the non-vectorized version, and there are no warnings to catch.