I have a data set of 9 samples (rows) with 51608 variables (columns) and I keep getting the error whenever I try to scale it:
This works fine
pca = prcomp(pca_data)
However,
pca = prcomp(pca_data, scale = T)
gives
> Error in prcomp.default(pca_data, center = T, scale = T) :
cannot rescale a constant/zero column to unit variance
Obviously it's a little hard to post a reproducible example. Any ideas what the deal could be?
Looking for constant columns:
sapply(1:ncol(pca_data), function(x){
length = unique(pca_data[, x]) %>% length
}) %>% table
Output:
.
2 3 4 5 6 7 8 9
3892 4189 2124 1783 1622 2078 5179 30741
So no constant columns. Same with NA's -
is.na(pca_data) %>% sum
>[1] 0
This works fine:
pca_data = scale(pca_data)
But then afterwards both still give the exact same error:
pca = prcomp(pca_data)
pca = prcomp(pca_data, center = F, scale = F)
So why cant I manage to get a scaled pca on this data? Ok, lets make 100% sure that it's not constant.
pca_data = pca_data + rnorm(nrow(pca_data) * ncol(pca_data))
Same errors. Numierc data?
sapply( 1:nrow(pca_data), function(row){
sapply(1:ncol(pca_data), function(column){
!is.numeric(pca_data[row, column])
})
} ) %>% sum
Still the same errors. I'm out of ideas.
Edit: more and a hack at least to solve it.
Later, still having a hard time clustering this data eg:
Error in hclust(d, method = "ward.D") :
NaN dissimilarity value in intermediate results.
Trimming values under a certain cuttoff eg < 1 to zero had no effect. What finally worked was trimming all columns that had more than x zeros in the column. Worked for # zeros <= 6, but 7+ gave errors. No idea if this means that this is a problem in general or if this just happened to catch a problematic column. Still would be happy to hear if anyone has any ideas why because this should work just fine as long as no variable is all zeros (or constant in another way).
See Question&Answers more detail:
os