Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
236 views
in Technique[技术] by (71.8m points)

r - PAM cluster visualization from dissimilarity measure using factoextra package

This question was initially posted on Cross Validated but is closed due to being "off-topic." I've since encountered the same issue(s) and wondering how it can be addressed programmatically.

Using the factoextra package from R, I am looking to visualize some cluster analyses using the fviz_cluster() function; specifically, I am encountering issues after performing PAM (i.e., cluster::pam).

NOTE: the data being used contains all numeric features with no missing values and have been scaled and centered prior to clustering.

My process currently looks as follows:

library(cluster)
library(factoextra)

data -> df
factoextra::get_dist(df, method = "spearman") -> dist_mtx

cluster::pam(
    x = dist_mtx, #dissimilarity matrix
    k = 4, #number of clusters
    diss = TRUE, #flag indicating use of disimiliarity matrix
    # metric = "euclidean", #ignored since dissimiliarty matrix is used
    pamonce = FALSE #default for original algo
    ) -> 
    pam_res

The PAM method can take a while depending on the size of the data set, but the output is

an object of class "pam" representing the clustering. See ?pam.object for details

This is where I'm running into issues because of the backend code of fviz_cluster. If I do the following:

fviz_cluster(
    object = pam_res, 
    # data = df, 
    geom = "point"
    )

I get an error stating:

Error in array(x, c(length(x), 1L), if (!is.null(names))) list(names(x), : 'data' must be of vector type, was NULL

The documentation states that the data argument is only required when visualizing kmeans or DBSCAN. The aforementioned code chunk still does not work even if the data is included in the fviz_cluster function.

One workaround was provided in this SO response and the actual data was appended to the resulting "pam" object (i.e., df -> pam_res$data). Although this works, I am wondering if it actually impacts the resulting visualization? The fviz_cluster function doesn't seem like it can use both a dissimilarity matrix AND data set to produce an image so is my dissimilarity matrix being ignored when I add the data to the object?

Any ideas would be much appreciated!

Cheers

question from:https://stackoverflow.com/questions/65944625/pam-cluster-visualization-from-dissimilarity-measure-using-factoextra-package

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...