Let's say that I have some data generated as follows:
N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3
and then I create some categorization variable:
indx = np.random.randint(0,3,size=N).astype(np.int32)
and generate a DataFrame:
import pandas as pd
df = pd.DataFrame(np.hstack((data, indx[:,None])),
columns=['a%s' % k for k in range(m)] + [ 'indx'])
I can get the mean value, per group as:
df.groubpy('indx').mean()
What I'm unsure of how to do is to then subtract the mean off of each group, per-column in the original data, so that the data in each column is normalized by the mean within group. Any suggestions would be appreciated.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…