r - Select the most dissimilar individual using cluster analysis

Question

Welcome To Ask or Share your Answers For Others

r - Select the most dissimilar individual using cluster analysis

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

r - Select the most dissimilar individual using cluster analysis

I want to cluster my data to say 5 clusters, then we need to select 50 individuals with most dissimilar relationship from all the data. That means if cluster one contains 100, two contains 200, three contains 400, four contains 200, and five 100, I have to select 5 from the first cluster + 10 from the second cluster + 20 from the third + 10 from the fourth + 5 from the fifth.

Data example:

     mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1))

What I did till now is clustering the data and rank the individuals within each cluster, then export it to excel and go from there … That has become became a problem since my data has became really big.

I will appreciate any help or suggestion on how to apply the previous in R .

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:12+0000

I′m not sure if it is exactly what you are searching, but maybe it helps:

mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1))
rownames(mydata) <- paste0("id", 1:100) # some id for identification


# cluster objects and calculate dissimilarity matrix
cl <- cutree(hclust(
  sim <- dist(mydata, diag = TRUE, upper=TRUE)), 5) 

# combine results, take sum to aggregate dissimilarity
res <- data.frame(id=rownames(mydata),
                  cluster=cl, dis_sim=rowSums(as.matrix(sim)))
# order, lowest overall dissimilarity will be first
res <- res[order(res$dis_sim), ] 


# split object
reslist <- split(res, f=res$cluster)


## takes first three items with highest overall dissim.
lapply(reslist, tail, n=3) 

## returns id′s with highest overall dissimilarity, top 20% 
lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2)

Categories

r - Select the most dissimilar individual using cluster analysis

r - Select the most dissimilar individual using cluster analysis

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags