python - Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

Question

Welcome To Ask or Share your Answers For Others

python - Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

I am trying to do the following but is seems that vectorized operations in this mode are not supported.

import pandas as pd
df=pd.DataFrame([[2017,1,15,1],
             [2017,1,15,2],
             [2017,1,15,3],
             [2017,1,15,4],
             [2017,1,15,5],
             [2017,1,15,6],
             [2017,1,15,7]],
             columns=['year','month','day','month_offset'])
df['date']=df.apply(lambda g: pd.datetime(g.year,g.month,g.day),axis=1)
df['offset']=df.apply(lambda g: pd.offsets.MonthEnd(g.month_offset),axis=1)
df['date_offset']=df.date+df.offset

This is the warning returned for last statement in the code snippet:

C:Python3.5.2.3WinPython-64bit-3.5.2.3python-3.5.2.amd64libsite-packagespandascoreops.py:533: PerformanceWarning: Adding/subtracting array of DateOffsets to Series not vectorized "Series not vectorized", PerformanceWarning)

I would like to this to work as a vectorized operation because of the performance benefits.

Thanks.

EDIT

To end, comparison of methods following on from @john-zwinck:

import time
import pandas as pd
import numpy as np

df=pd.DataFrame([[2017,1,1,1],
             [2017,1,1,2],
             [2017,1,1,3],
             [2017,1,1,4],
             [2017,1,1,5],
             [2017,1,1,6],
             [2017,1,1,7]],
             columns=['year','month','day','month_offset'])

df['mydate']=df.apply(lambda g: 
pd.datetime(g.year,g.month,g.day),axis=1)
start_time=time.time()
df['pandas_offset']=df.apply(lambda g: g.mydate + 
pd.offsets.MonthEnd(g.month_offset),axis=1)
end_time=time.time()
print('Method1 {} seconds'.format(end_time-start_time))

start_time=time.time()
df['numpy_offset']=(df.mydate.values.astype('M8[M]')+ 
df.month_offset.values * np.timedelta64(1, 'M')).astype('M8[D]') - 
np.timedelta64(1, 'D')
end_time=time.time()
print('Method3 with numpy vectorization {} seconds'.format(end_time-
start_time))

The result:

index year  month  day  month_offset     mydate    offset1      final
0  2017      1    1             1 2017-01-01 2017-01-31 2017-01-31
1  2017      1    1             2 2017-01-01 2017-02-28 2017-02-28
2  2017      1    1             3 2017-01-01 2017-03-31 2017-03-31
3  2017      1    1             4 2017-01-01 2017-04-30 2017-04-30
4  2017      1    1             5 2017-01-01 2017-05-31 2017-05-31
5  2017      1    1             6 2017-01-01 2017-06-30 2017-06-30
6  2017      1    1             7 2017-01-01 2017-07-31 2017-07-31


runfile('C:/bitbucket/test/vector_dates.py', wdir='C:/bitbucket/test')
Method 1 0.003999948501586914 seconds
Method 2 with numpy vectorization 0.0009999275207519531 seconds

Clearly numpy much faster

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:40:18+0000

A truly vectorized way to do this is to construct an array of numpy.timedelta64 from month_offset, add this to the array of dates, then subtract numpy.timedelta64(1, 'D') to go back to the last day of the previous month.

Solutions using apply(lambda) are likely to be much slower. And as the warning said, some Pandas date offset operations are not vectorized. If your data are large, it's better to avoid them. The NumPy facilities like busday_offset() and timedelta64 are fully performant.

Categories

python - Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

python - Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

EDIT

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags