I have a CSV that looks like this:
gene,stem1,stem2,stem3,b1,b2,b3,special_col
foo,20,10,11,23,22,79,3
bar,17,13,505,12,13,88,1
qui,17,13,5,12,13,88,3
And as data frame it looks like this:
In [17]: import pandas as pd
In [20]: df = pd.read_table("http://dpaste.com/3PQV3FA.txt",sep=",")
In [21]: df
Out[21]:
gene stem1 stem2 stem3 b1 b2 b3 special_col
0 foo 20 10 11 23 22 79 3
1 bar 17 13 505 12 13 88 1
2 qui 17 13 5 12 13 88 3
What I want to do is to perform pearson correlation from last column (special_col
) with every columns between gene
column and special column
, i.e. colnames[1:number_of_column-1]
At the end of the day we will have length 6 data frame.
Coln PearCorr
stem1 0.5
stem2 -0.5
stem3 -0.9999453506011533
b1 0.5
b2 0.5
b3 -0.5
The above value is computed manually:
In [27]: import scipy.stats
In [39]: scipy.stats.pearsonr([3, 1, 3], [11,505,5])
Out[39]: (-0.9999453506011533, 0.0066556395400007278)
How can I do that?
See Question&Answers more detail:
os