Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
179 views
in Technique[技术] by (71.8m points)

r - Select the most dissimilar individual using cluster analysis

I want to cluster my data to say 5 clusters, then we need to select 50 individuals with most dissimilar relationship from all the data. That means if cluster one contains 100, two contains 200, three contains 400, four contains 200, and five 100, I have to select 5 from the first cluster + 10 from the second cluster + 20 from the third + 10 from the fourth + 5 from the fifth.

Data example:

     mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1))

What I did till now is clustering the data and rank the individuals within each cluster, then export it to excel and go from there … That has become became a problem since my data has became really big.

I will appreciate any help or suggestion on how to apply the previous in R .

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I′m not sure if it is exactly what you are searching, but maybe it helps:

mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1))
rownames(mydata) <- paste0("id", 1:100) # some id for identification


# cluster objects and calculate dissimilarity matrix
cl <- cutree(hclust(
  sim <- dist(mydata, diag = TRUE, upper=TRUE)), 5) 

# combine results, take sum to aggregate dissimilarity
res <- data.frame(id=rownames(mydata),
                  cluster=cl, dis_sim=rowSums(as.matrix(sim)))
# order, lowest overall dissimilarity will be first
res <- res[order(res$dis_sim), ] 


# split object
reslist <- split(res, f=res$cluster)


## takes first three items with highest overall dissim.
lapply(reslist, tail, n=3) 

## returns id′s with highest overall dissimilarity, top 20% 
lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...