python - pandas columns correlation with statistical significance

Question

Welcome To Ask or Share your Answers For Others

python - pandas columns correlation with statistical significance

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - pandas columns correlation with statistical significance

What is the best way, given a pandas dataframe, df, to get the correlation between its columns df.1 and df.2?

I do not want the output to count rows with NaN, which pandas built-in correlation does. But I also want it to output a pvalue or a standard error, which the built-in does not.

SciPy seems to get caught up by the NaNs, though I believe it does report significance.

Data example:

     1           2
0    2          NaN
1    NaN         1
2    1           2
3    -4          3
4    1.3         1
5    NaN         NaN

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:47:36+0000

To calculate all the p-values at once, you can use calculate_pvalues function (code below):

df = pd.DataFrame({'A':[1,2,3], 'B':[2,5,3], 'C':[5,2,1], 'D':['text',2,3] })
calculate_pvalues(df)

The output is similar to the corr() (but with p-values):

            A       B       C
    A       0  0.7877  0.1789
    B  0.7877       0  0.6088
    C  0.1789  0.6088       0

Details:

Column D is automatically ignored as it contains text.
p-values are rounded to 4 decimals
You can subset to indicate exact columns: calculate_pvalues(df[['A','B','C']]

Following is the code of the function:

from scipy.stats import pearsonr
import pandas as pd

def calculate_pvalues(df):
    df = df.dropna()._get_numeric_data()
    dfcols = pd.DataFrame(columns=df.columns)
    pvalues = dfcols.transpose().join(dfcols, how='outer')
    for r in df.columns:
        for c in df.columns:
            pvalues[r][c] = round(pearsonr(df[r], df[c])[1], 4)
    return pvalues

Categories

python - pandas columns correlation with statistical significance

python - pandas columns correlation with statistical significance

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Following is the code of the function:

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags