You pretty much figured out all the pieces, just need to combine them:
>>> df.groupby('ID')[['Val1','Val2']].corr()
Val1 Val2
ID
A Val1 1.000000 0.500000
Val2 0.500000 1.000000
B Val1 1.000000 0.385727
Val2 0.385727 1.000000
In your case, printing out a 2x2 for each ID is excessively verbose. I don't see an option to print a scalar correlation instead of the whole matrix, but you can do something simple like this if you only have two variables:
>>> df.groupby('ID')[['Val1','Val2']].corr().iloc[0::2,-1]
ID
A Val1 0.500000
B Val1 0.385727
For the more general case of 3+ variables
For 3 or more variables, it is not straightforward to create concise output but you could do something like this:
groups = list('Val1', 'Val2', 'Val3', 'Val4')
df2 = pd.DataFrame()
for i in range( len(groups)-1):
df2 = df2.append( df.groupby('ID')[groups].corr().stack()
.loc[:,groups[i],groups[i+1]:].reset_index() )
df2.columns = ['ID', 'v1', 'v2', 'corr']
df2.set_index(['ID','v1','v2']).sort_index()
Note that if we didn't have the groupby
element, it would be straightforward to use an upper or lower triangle function from numpy. But since that element is present, it is not so easy to produce concise output in a more elegant manner as far as I can tell.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…