You can specify the conversion explicitly using struct and array:
import pyspark.sql.functions as F
df.printSchema()
#root
# |-- features: struct (nullable = false)
# | |-- value: double (nullable = false)
df2 = df.withColumn(
'features',
F.struct(
F.array(F.col('features')['value']).alias('values')
)
)
df2.printSchema()
#root
# |-- features: struct (nullable = false)
# | |-- values: array (nullable = false)
# | | |-- element: double (containsNull = false)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…