Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
282 views
in Technique[技术] by (71.8m points)

Add JSON object field to a JSON array field in the dataframe using scala

Is there any method, where i can add a json object to already existing json object array:

I have a dataframe:

+-------------------------+---------------------------------------------------------+------------+
|   name                  |       hit_songs                                         |  column1   |
+-------------------------+---------------------------------------------------------+------------+
|{"HomePhone":"34567002"} | [{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}] | value1     |
|{"HomePhone":"34567011"} | [{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}] |  value2    |
+-------------------------+---------------------------------------------------------+------------+ 

I want a resulting dataframe as:

+---------------------------------------------------------------------------------+------------+
|   name                                                                                column1  
+------------------------------------------------------------------------------------+------------+
|[ {"HomePhone":"34567002"},{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"} ] |  value1     |
|[ {"HomePhone":"34567011"},{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"} ] |   value2    |
+-------------------------+---------------------------------------------------------++------------+
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use array_union function.

name is of type string, to convert this column to array type use array

Check below code.

scala> df.show(false)
+------------------------+-------------------------------------------------------+
|name                    |hit_songs                                              |
+------------------------+-------------------------------------------------------+
|{"HomePhone":"34567002"}|[{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|
|{"HomePhone":"34567011"}|[{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|
+------------------------+-------------------------------------------------------+


scala> df.withColumn("name",array_union(array($"name"),$"hit_songs")).show(false) // Use array_union function, to join name string column with hit_songs array column, first convert name to array(name).
+---------------------------------------------------------------------------------+-------------------------------------------------------+
|name                                                                             |hit_songs                                              |
+---------------------------------------------------------------------------------+-------------------------------------------------------+
|[{"HomePhone":"34567002"}, {"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|[{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|
|[{"HomePhone":"34567011"}, {"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|[{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|
+---------------------------------------------------------------------------------+-------------------------------------------------------+
scala> df.show(false)
+------------------------+-------------+-------------------------------------------------------+
|name                    |dammy        |hit_songs                                              |
+------------------------+-------------+-------------------------------------------------------+
|{"HomePhone":"34567002"}|{"aaa":"aaa"}|[{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|
|{"HomePhone":"34567011"}|{"bbb":"bbb"}|[{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|
+------------------------+-------------+-------------------------------------------------------+


scala> df.printSchema
root
 |-- name: string (nullable = true)
 |-- dammy: string (nullable = true)
 |-- hit_songs: array (nullable = true)
 |    |-- element: string (containsNull = true)


scala> df.withColumn("name",array_union(array_union(array($"name"),$"hit_songs"),array($"dammy"))).show(false)

+---------------------------------------------------------------------------------+-------------+-------------------------------------------------------+
|name                                                                             |dammy        |hit_songs                                              |
+---------------------------------------------------------------------------------+-------------+-------------------------------------------------------+
|[{"HomePhone":"34567002"}, {"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|{"aaa":"aaa"}|[{"Phonetypecode":"PTC001"},{"Phonetypecode":"PTC002"}]|
|[{"HomePhone":"34567011"}, {"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|{"bbb":"bbb"}|[{"Phonetypecode":"PTC021"},{"Phonetypecode":"PTC022"}]|
+---------------------------------------------------------------------------------+-------------+-------------------------------------------------------+


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...