I am looking to pass a pandas dataframe to a custom function as efficiently as possible.
The biggest gains in speed are through vectorization and from getting rid of the overhead used by pd.apply
.
It seems to me that some custom functions don't really lend themselves to vectorization (for example if the custom function contains a for
loop in it.) Does this sound right?
Can someone help set the columns of a dataframe to the outputs of a function using this notation:
df['newA'], df['newB'], df['newC']= test(df.A, df.B, df.C)
The alternative options below produce results but I can't find an efficient way to input those results into the dataframe like the above would.
import numpy as np
import pandas as pd
df=pd.DataFrame(np.arange(10*3).reshape((10, 3)), columns=['A','B','C'])
def test(x,y,z):
if x>=30:
pass
return x,y,z
#Option 1
vectest=np.vectorize(test, otypes=[np.ndarray])
result=vectest(df.A, df.B, df.C)
#Option 2
result=list(map(test,df.A, df.B, df.C))
All of this is to avoid a crude loop or using apply
.
question from:
https://stackoverflow.com/questions/65838896/numpy-vectorize-alternative-to-get-rid-of-pandas-apply 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…