r - Show correlations as an ordered list, not as a large matrix

Question

Welcome To Ask or Share your Answers For Others

r - Show correlations as an ordered list, not as a large matrix

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Show correlations as an ordered list, not as a large matrix

I've a data frame with 100+ columns. cor() returns remarkably quickly, but tells me far too much, especially as most columns are not correlated. I'd like it to just tell me column pairs and their correlation, ideally ordered.

In case that doesn't make sense here is an artificial example:

df = data.frame(a=1:10,b=20:11*20:11,c=runif(10),d=runif(10),e=runif(10)*1:10)
z = cor(df)

z looks like this:

           a          b           c           d          e
a  1.0000000 -0.9966867 -0.38925240 -0.35142452  0.2594220
b -0.9966867  1.0000000  0.40266637  0.35896626 -0.2859906
c -0.3892524  0.4026664  1.00000000  0.03958307  0.1781210
d -0.3514245  0.3589663  0.03958307  1.00000000 -0.3901608
e  0.2594220 -0.2859906  0.17812098 -0.39016080  1.0000000

What I'm looking for is a function that will instead tell me:

a:b -0.9966867 
b:c  0.4026664
d:e -0.39016080  
a:c -0.3892524 
b:d  0.3589663
a:d -0.3514245 
b:e -0.2859906
a:e  0.2594220 
c:e  0.17812098
c:d  0.03958307

I have a crude way to get rid of some of the noise:

z[abs(z)<0.5]=0

then scan looking for non-zero values. But it is far inferior to the desired output above.

UPDATE: Based on the answers received, and some trial and error, here is the solution I went with:

z[lower.tri(z,diag=TRUE)]=NA  #Prepare to drop duplicates and meaningless information
z=as.data.frame(as.table(z))  #Turn into a 3-column table
z=na.omit(z)  #Get rid of the junk we flagged above
z=z[order(-abs(z$Freq)),]    #Sort by highest correlation (whether +ve or -ve)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:58:30+0000

I always use

zdf <- as.data.frame(as.table(z))
zdf
#    Var1 Var2     Freq
# 1     a    a  1.00000
# 2     b    a -0.99669
# 3     c    a -0.14063
# 4     d    a -0.28061
# 5     e    a  0.80519

Then use subset(zdf, abs(Freq) > 0.5) to select significant values.

Categories

r - Show correlations as an ordered list, not as a large matrix

r - Show correlations as an ordered list, not as a large matrix

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags