python - Pandas Group Weighted Average of Multiple Columns

Question

Welcome To Ask or Share your Answers For Others

python - Pandas Group Weighted Average of Multiple Columns

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas Group Weighted Average of Multiple Columns

Say I have the following dataframe:

>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
  category  var1  var2  weights
0        a    37    36        7
1        a    47    20        1
2        b    33     7        6
3        b    16     6        8

I can calculate the weighted average of a 'var1' as such:

>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a    38.250000
b    23.285714
dtype: float64

However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.

Just as I can get an unweighted average of both columns like this:

>>> Grouped[['var1','var2']].mean()
          var1  var2
category            
a         42.0  28.0
b         24.5   6.5

I'm wondering if there is a parallel way to do that with weighted averages.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:35:19+0000

You can apply and return both averages:

In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

You could write this slightly cleaner as a function:

In [21]: def weighted(x, cols, w="weights"):
             return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)

In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

Categories

python - Pandas Group Weighted Average of Multiple Columns

python - Pandas Group Weighted Average of Multiple Columns

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags