r - Clustering - how to find the nearest to a cluster

Question

Welcome To Ask or Share your Answers For Others

r - Clustering - how to find the nearest to a cluster

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

r - Clustering - how to find the nearest to a cluster

Hints I got as to a different question puzzled me quite a bit.

I got an exercise, actually part of a larger exercise:

Cluster some data, using hclust (done)
Given a totally new vector, find out to which of the clusters you got in 1 it is nearest.

According to the excercise, this should be done in quite short a time.

However, after weeks I am puzzled whether this can be done at all, as apparently all I really get from hclust is a tree - and not, as I assumed, a number of clusters.

As I suppose I was unclear:

Say, for instance, I feed hclust a matrix which consists of 15 1x5 Vectors, 5 times (1 1 1 1 1 ), 5 times (2 2 2 2 2) and 5 times (3 3 3 3 3). This should give me three quite distinct clusters of size 5, anyone can easily do that by hand. Is there a command to use so that I can actually find out from the program that there are 3 such clusters in my hclust-object and what they contain?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:21:19+0000

You'll have to think about what the right metric is to define closeness to the cluster. Building on the example in the hclust doc, here's a way to compute the means for each cluster and then measure the distance between the new data point and the set of means.

# Leave out one state
A <-USArrests
B <-A[rownames(A)!="Kentucky",]
KY <- A[rownames(A)=="Kentucky",]

# Put the B data into 10 clusters
hc   <- hclust(dist(B), "ave")
memb <- cutree(hc, k = 10)
B$cluster = memb[rownames(B)==names(memb)]

# Compute the averages over the clusters
M <-aggregate( .~cluster, data=B, FUN=mean)
M$cluster=NULL

# Now add the hold out state to the set of averages
M <-rbind(M,KY)

# Compute the distance between the clusters and the hold out state.
# This is a pretty silly way to do this but it works.
D <- as.matrix(dist(as.matrix(M),diag=TRUE,upper=TRUE))["Kentucky",]
names(D) = rownames(M)
KYclust  = which.min(D[-length(D)])
memb[memb==KYclust]

# Now cluster the full set of states and compare the results.  
hc   <- hclust(dist(A), "ave")
memb <- cutree(hc, k = 10)
a=memb[which(names(memb)=="Kentucky")]
memb[memb==a]

Categories

r - Clustering - how to find the nearest to a cluster

r - Clustering - how to find the nearest to a cluster

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags