Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
530 views
in Technique[技术] by (71.8m points)

combine model.clusterCenters and model.transfrom of k-means in scala spark

Update: I want to achieve this in scala spark, not pyspark, so the answer suggested in this Convert KMeans “centers” output to PySpark dataframe question doesn't work(no tolist method)

I have trained a k-means model. With "model.clusterCenters", I can get the vector of each cluster center point, the data type is Array[Vector], with "model.transform", I can get the prediction index for each sample vector as a new column.

So suppose if the training dataset dataframe is like:

+------------+--------------------+
|rest column |            features|
+------------+--------------------|
|     1646177|[231.8,232.1,233....|       
|     1646177|[232.2,234.2,234....|        
|     1646178|[241.1,234.1,244....|
|     ...    |...                 |
-----------------------------------

after model.transform(), I get the following:

+------------+--------------------+----------+
|rest column |            features|prediction|
+------------+--------------------+----------+
|     1646177|[231.8,232.1,233....|        01|
|     1646177|[232.2,234.2,234....|        01|
|     1646178|[232.1,234.1,234....|        02|
|     ...    |...                 |       ...|
----------------------------------------------

after "model.clusterCenters", i can get an array of vector like following:

[230.99036144578324,231.08433734939757,231.3566265060241...]
[160.6,159.9,177.2...]
[69.3,70.1,70.6...]
...

where the number of cluster centers correspond to the unique values(01,02...) in the "prediction" column of the above dataframe generated by model.transform

What I want to achieve is to concatenate the both,the expected result should be like following:

+------------+--------------------+----------+---------------+
|rest column |            features|prediction|cluster centers| 
+------------+--------------------+----------+---------------+
|     1646177|[231.8,232.1,233....|        01|[123,456,789...|
|     1646177|[232.2,234.2,234....|        01|[123,456,789...|
|     1646178|[232.1,234.1,234....|        02|[232,243,223...|
|     ...    |...                 |       ...|...            |
-------------------------------------------------------------+

Any suggestion is appreciated!

question from:https://stackoverflow.com/questions/65913728/combine-model-clustercenters-and-model-transfrom-of-k-means-in-scala-spark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...