Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
317 views
in Technique[技术] by (71.8m points)

apache spark - PySpark Add new object in nested field if not exist

Schema

 root
     |-- userId: string (nullable = true)
     |-- languageknowList: array (nullable = true)
     |    |-- element: struct (containsNull = false)
     |    |    |-- code: string (nullable = false)
     |    |    |-- description: string (nullable = false)
     |    |    |-- name: string (nullable = false)

The df has userId and languageknownList. Every user should know English, so English language is not present in languageknowList I have to add.

English
code: 10
description: English Language
name: English

Any one please help me.

question from:https://stackoverflow.com/questions/65600994/pyspark-add-new-object-in-nested-field-if-not-exist

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can create a new array of structs column and concat to the existing column:

import pyspark.sql.functions as F

english = F.struct(F.lit('10').alias('code'),
                   F.lit('English Language').alias('description'), 
                   F.lit('English').alias('name')
                  )

df2 = df.withColumn(
    'languageknowList',
    F.when(
        ~F.array_contains(F.col('languageknowList'), english),
        F.concat(
            F.col('languageknowList'),
            F.array(english)
        )
    ).otherwise(
        F.col('languageknowList')
    )
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...