I have a large data table containing 2 sets of 4 paired observations, the first few lines of which are as below:
a1 a2 a3 a4 b1 b2 b3 b4
1 480 770 601 953 469 750 588 944
2 0 0 0 0 0 0 0 0
3 3 13 9 12 3 12 9 12
4 0 2 4 3 0 14 3 2
5 0 0 11 0 0 0 11 0
6 165 292 162 313 180 368 116 368
These are gene-expression counts from two different RNA-seq analysis pipelines 'a' and 'b': columns a1 and b1 are the results of analyzing the same sample (1) by the two different pipelines, same with a2 and b2, etc. Each row (1-6) is a different gene. I want to find if there are specific genes that show particularly poor pairwise correlation, i.e. overall correlation between column 1 & 5, 2 & 6, 3 & 7, 4 & 8. I can do this manually using the cor.test
function, e.g. for the data in the first row:
cor.test(c(480,770,601,953), c(469,750,588,944))$estimate
cor
0.9997302
But for the life of me, I can't figure out how to do this in an automated fashion across the data table (i.e. returning a vector of correlation coefficients, one per row). I could probably do some sort of for
loop, but that seems like an ugly solution and not the "R way."
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…