python - pandas GroupBy aggregate only one column

Question

Welcome To Ask or Share your Answers For Others

python - pandas GroupBy aggregate only one column

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - pandas GroupBy aggregate only one column

I have a DataFrame of the following form:

>>> sales = pd.DataFrame({'seller_id':list('AAAABBBB'),'buyer_id':list('CCDECDEF'),
                          'amount':np.random.randint(10,20,size=(8,))})
>>> sales = sales[['seller_id','buyer_id','amount']]
>>> sales
  seller_id buyer_id  amount
0         A        C      18
1         A        C      15
2         A        D      11
3         A        E      12
4         B        C      16
5         B        D      18
6         B        E      16
7         B        F      19

Now what I would like to do is for each seller calculate the share of total sale amount taken up by its largest buyer. I have code that does this, but I have to keep resetting the index and grouping again, which is wasteful. There has to be a better way. I would like a solution where I can aggregate one column at a time and keep the others grouped. Here's my current code:

>>> gr2 = sales.groupby(['buyer_id','seller_id'])
>>> seller_buyer_level = gr2['amount'].sum() # sum over different purchases
>>> seller_buyer_level_reset = seller_buyer_level.reset_index('buyer_id')
>>> gr3 = seller_buyer_level_reset.groupby(seller_buyer_level_reset.index)
>>> result = gr3['amount'].max() / gr3['amount'].sum()

>>> result
seller_id
A    0.589286
B    0.275362

I simplified a bit. In reality I also have a time period column, and so I want to do this at the seller and time period level, that's why in gr3 I'm grouping by the multi-index (in this example, it appears as a single index). I thought there would be a solution where instead of reducing and regrouping I would be able to aggregate only one index out of the group, leaving the others grouped, but couldn't find it in the documentation or online. Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:05:56+0000

Here's a one-liner, but it resets the index once, too:

sales.groupby(['seller_id','buyer_id']).sum().
    reset_index(level=1).groupby(level=0).
    apply(lambda x: x.amount.max()/x.amount.sum())
#seller_id
#A    0.509091
#B    0.316667
#dtype: float64

Categories

python - pandas GroupBy aggregate only one column

python - pandas GroupBy aggregate only one column

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags