Say I have the following dataframe:
>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
category var1 var2 weights
0 a 37 36 7
1 a 47 20 1
2 b 33 7 6
3 b 16 6 8
I can calculate the weighted average of a 'var1' as such:
>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a 38.250000
b 23.285714
dtype: float64
However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.
Just as I can get an unweighted average of both columns like this:
>>> Grouped[['var1','var2']].mean()
var1 var2
category
a 42.0 28.0
b 24.5 6.5
I'm wondering if there is a parallel way to do that with weighted averages.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…