You can use UDF:
udf(lambda vs: Vectors.dense(vs), VectorUDT())
In Spark < 2.0 import:
from pyspark.mllib.linalg import Vectors, VectorUDT
In Spark 2.0+ import:
from pyspark.ml.linalg import Vectors, VectorUDT
Please note that these classes are not compatible despite identical implementation.
It is also possible to extract individual features and assemble with VectorAssembler
. Assuming input column is called features
:
from pyspark.ml.feature import VectorAssembler
n = ... # Size of features
assembler = VectorAssembler(
inputCols=["features[{0}]".format(i) for i in range(n)],
outputCol="features_vector")
assembler.transform(df.select(
"*", *(df["features"].getItem(i) for i in range(n))
))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…