Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.5k views
in Technique[技术] by (71.8m points)

python - Pandas groupby aggregation with percentages

I have the following dataframe:

import pandas as pd
import numpy as np
np.random.seed(123)
n = 10
df = pd.DataFrame({"val": np.random.randint(1, 10, n), 
                   "cat": np.random.choice(["X", "Y", "Z"], n)})

   val cat
0    3   Z
1    3   X
2    7   Y
3    2   Z
4    4   Y
5    7   X
6    2   X
7    1   X
8    2   X
9    1   Y

I want to know the percentage each category X, Y, and Z has of the entire val column sum. I can aggregate df like this:

total_sum = df.val.sum()
#32
s = df.groupby("cat").val.sum().div(total_sum)*100

#this is the desired result in % of total val
cat
X    46.875  #15/32
Y    37.500  #12/32
Z    15.625  #5/32
Name: val, dtype: float64

However, I find it rather surprising that pandas seemingly does not have a percentage/frequency function something like df.groupby("cat").val.freq() instead of df.groupby("cat").val.sum() or df.groupby("cat").val.mean(). I assumed this is a common operation, and Series.value_counts has implemented this with normalize=True - but for groupby aggregation, I cannot find anything similar. Am I missing here something or is there indeed no out-of-the-box function?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...