python - Why Sklearn TruncatedSVD's explained variance ratios are not in descending order?

Question

Welcome To Ask or Share your Answers For Others

python - Why Sklearn TruncatedSVD's explained variance ratios are not in descending order?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why Sklearn TruncatedSVD's explained variance ratios are not in descending order?

Why Sklearn.decomposition.TruncatedSVD's explained variance ratios are not ordered by singular values?

My code is below:

X = np.array([[1,1,1,1,0,0,0,0,0,0,0,0,0,0],
           [0,0,1,1,1,1,1,1,1,0,0,0,0,0],
           [0,0,0,0,0,0,1,1,1,1,1,1,0,0],
           [0,0,0,0,0,0,0,0,0,0,1,1,1,1]])
svd = TruncatedSVD(n_components=4)
svd.fit(X4)
print(svd.explained_variance_ratio_)
print(svd.singular_values_)

and the results:

[0.17693405 0.46600983 0.21738089 0.13967523]
[3.1918354  2.39740372 1.83127499 1.30808033]

I heard that a singular value means how much the component can explain data, so I think explained variance ratios also are followed by the order of singular values. But the ratios are not ordered by descending order.

Can someone explain why does it happen?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:25:55+0000

I heard that a singular value means how much the component can explain data

This holds for PCA, but it is not exactly true for (truncated) SVD; quoting from a relevant Github thread back in the day when an explained_variance_ratio_ attribute was not even available for TruncatedSVD (2014 - emphasis mine):

preserving the variance is not the exact objective function of truncated SVD without centering

So, the singular values themselves are indeed sorted in descending order, but this does not hold necessarily for the corresponding explained variance ratios if the data are not centered.

But if we do center the data before, then the explained variance ratios come out sorted in descending order indeed, in correspondence with the singular values themselves:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD

sc = StandardScaler()
Xs = sc.fit_transform(X) # X data from the question here

svd = TruncatedSVD(n_components=4)
svd.fit(Xs)

print(svd.explained_variance_ratio_)
print(svd.singular_values_)

Result:

[4.60479851e-01 3.77856541e-01 1.61663608e-01 8.13905807e-66]
[5.07807756e+00 4.59999633e+00 3.00884730e+00 8.21430014e-17]

For the mathematical & computational differences between centered and non-centered data in PCA & SVD calculations, see How does centering make a difference in PCA (for SVD and eigen decomposition)?

Regarding the use of TruncatedSVD itself, here is user ogrisel again (scikit-learn contributor) in a relevant answer in Difference between scikit-learn implementations of PCA and TruncatedSVD:

In practice TruncatedSVD is useful on large sparse datasets which cannot be centered without making the memory usage explode.

So, it's not exactly clear why you have selected to use TruncatedSVD here, but, if you don't have a too-large dataset that causes memory issues, I guess you should revert to PCA instead.

Categories

python - Why Sklearn TruncatedSVD's explained variance ratios are not in descending order?

python - Why Sklearn TruncatedSVD's explained variance ratios are not in descending order?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags