Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

python - Normalize DataFrame by group

Let's say that I have some data generated as follows:

N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3

and then I create some categorization variable:

indx = np.random.randint(0,3,size=N).astype(np.int32)

and generate a DataFrame:

import pandas as pd
df = pd.DataFrame(np.hstack((data, indx[:,None])), 
             columns=['a%s' % k for k in range(m)] + [ 'indx'])

I can get the mean value, per group as:

df.groubpy('indx').mean()

What I'm unsure of how to do is to then subtract the mean off of each group, per-column in the original data, so that the data in each column is normalized by the mean within group. Any suggestions would be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
In [10]: df.groupby('indx').transform(lambda x: (x - x.mean()) / x.std())

should do it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...