Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
485 views
in Technique[技术] by (71.8m points)

machine learning - Assign new data point to cluster in kernel k-means (kernlab package in R)?

I have a question about the kkmeans function in the kernlab package of R. I am new to this package and please forgive me if I'm missing something obvious here.

I would like to assign a new data point to a cluster in a set of clusters that were created using kernel k-means with the function 'kkmeans'. With regular clustering, one would do this by calculating the Euclidian distance between the new data point and the cluster centroids, and choose the cluster with the closest centroid. In kernel k-means, one must do this in the feature space.

Take the example used in the kkmeans description:

data(iris)

sc <- kkmeans(as.matrix(iris[,-5]), centers=3)

Say that I have a new data point here, which I would like to assign to the closest cluster created above in sc.

Sepal.Length  Sepal.Width  Petal.Length  Petal.Width
     5.0         3.6          1.2         0.4 

Any tips on how to do this? Your help is very appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Kernel K-means uses the Kernel function to calculate similarity of objects. In the simple k-means you loop through all centroids and select the one which minimizes the distance (under used metric) to the given data point. In case of kernel method (default kernel function in kkmeans is radial basis function), you simply loop through centroids and select the one that maximizes the kernel function value (in case of RBF) or minimizes the kernel induced distance (for any kernel). Detailed description of converting kernel to distance measure is provided here - in general distance induced by kernel K can be calculated through d^2(a,b) = K(a,a)+K(b,b)-2K(a,b), but as in case of RBF, K(x,x)=1 for all x, you can just maximize the K(a,b) instead of minimizing the whole K(a,a)+K(b,b)-2K(a,b).

To get the kernel function from kkmeans object you can use kernelf function

> data(iris)
> sc <- kkmeans(as.matrix(iris[,-5]), centers=3)
> K = kernelf(sc)

So for your example

> c=centers(sc)
> x=c(5.0, 3.6, 1.2, 0.4)
> K(x,c[1,])
             [,1]
[1,] 1.303795e-11
> K(x,c[2,])
             [,1]
[1,] 8.038534e-06
> K(x,c[3,])
          [,1]
[1,] 0.8132268
> which.max( c( K(x,c[1,]), K(x,c[2,]), K(x,c[3,]) ) )
[1] 3

the closest centroid is c[3,]=5.032692 3.401923 1.598077 0.3115385 in the sense of used kernel function.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...