Calculating correlation between columns of R data frame

Question

Welcome To Ask or Share your Answers For Others

Calculating correlation between columns of R data frame

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Calculating correlation between columns of R data frame

I have a large data table containing 2 sets of 4 paired observations, the first few lines of which are as below:

   a1  a2  a3  a4  b1  b2  b3  b4
1 480 770 601 953 469 750 588 944
2   0   0   0   0   0   0   0   0
3   3  13   9  12   3  12   9  12
4   0   2   4   3   0  14   3   2
5   0   0  11   0   0   0  11   0
6 165 292 162 313 180 368 116 368

These are gene-expression counts from two different RNA-seq analysis pipelines 'a' and 'b': columns a1 and b1 are the results of analyzing the same sample (1) by the two different pipelines, same with a2 and b2, etc. Each row (1-6) is a different gene. I want to find if there are specific genes that show particularly poor pairwise correlation, i.e. overall correlation between column 1 & 5, 2 & 6, 3 & 7, 4 & 8. I can do this manually using the cor.test function, e.g. for the data in the first row:

cor.test(c(480,770,601,953), c(469,750,588,944))$estimate
      cor 
0.9997302

But for the life of me, I can't figure out how to do this in an automated fashion across the data table (i.e. returning a vector of correlation coefficients, one per row). I could probably do some sort of for loop, but that seems like an ugly solution and not the "R way."

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:05:36+0000

You could use apply to return a row-wise correlation. Set the MARGIN to 1 to apply your function to each row. Then you can use lapply to print out only the cor estimates of the list.

Here the code for you example:

l <- apply(X = df, MARGIN = 1, FUN = function(x) cor.test(x[1:4], x[5:8]))
lapply(X = l, FUN = function(x) x$estimate)

To do a correlation between columns you set the MARGIN to 2 and change your subsets to the columns you like to compare.

l <- apply(X = df, MARGIN = 2, FUN = function(x) cor.test(x[2], x[6]))
    lapply(X = l, FUN = function(x) x$estimate)

Categories

Calculating correlation between columns of R data frame

Calculating correlation between columns of R data frame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags