python - Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

Question

Welcome To Ask or Share your Answers For Others

python - Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

I'm trying to use some Time Series Analysis in Python, using Numpy.

I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.

The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag.

The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center of the list as the Lag would indicate), but the values are all weird - up to 500, when I was expecting something from -1 to 1.

I can't just divide it all by the max value; I know the max correlation isn't 1.

How could I normalize the "cross-correlation" (correlation in "full" mode) so the return values would be the correlation on each lag step instead those very large, strange values?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:46:42+0000

You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are

if normalize:
    a = (a - mean(a)) / (std(a) * len(a))
    v = (v - mean(v)) /  std(v)

where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.

Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.

By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.

Categories

python - Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

python - Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags