Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

How to convert DataFrame columns from struct<value:double> to struct<values:array<double>> in pyspark?

I have a DataFrame with this structure:

root
 |-- features: struct (nullable = true)
 |    |-- value: double (nullable = true)

and I wanna convert value with double type to "values with array" type. How can I do that?

question from:https://stackoverflow.com/questions/65844357/how-to-convert-dataframe-columns-from-structvaluedouble-to-structvaluesarra

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can specify the conversion explicitly using struct and array:

import pyspark.sql.functions as F

df.printSchema()
#root
# |-- features: struct (nullable = false)
# |    |-- value: double (nullable = false)

df2 = df.withColumn(
    'features',
    F.struct(
        F.array(F.col('features')['value']).alias('values')
    )
)

df2.printSchema()
#root
# |-- features: struct (nullable = false)
# |    |-- values: array (nullable = false)
# |    |    |-- element: double (containsNull = false)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...