To extract the content of payload in your JSON you can use get_json_object
. And to create the new output you can use the built-in functions struct
and to_json
.
Given a Dataframe:
val df = Seq(("""{"metadata": "whatever", "payload": {"catKey": 1}}""")).toDF("value").as[String]
df.show(false)
+--------------------------------------------------+
|value |
+--------------------------------------------------+
|{"metadata": "whatever", "payload": {"catKey": 1}}|
+--------------------------------------------------+
Then creating the new column called "value"
val df2 = df
.withColumn("catVal", lit("category-1")) // whatever your logic is to fill this column
.withColumn("payload",
struct(
get_json_object(col("value"), "$.payload.catKey").as("catKey"),
col("catVal").as("catVal")
)
)
.withColumn("metadata",
get_json_object(col("value"), "$.metadata"),
).select("metadata", "payload")
df2.show(false)
+--------+---------------+
|metadata|payload |
+--------+---------------+
|whatever|[1, category-1]|
+--------+---------------+
val df3 = df2.select(to_json(struct(col("metadata"), col("payload"))).as("value"))
df3.show(false)
+----------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------+
|{"metadata":"whatever","payload":{"catKey":"1","catVal":"category-1"}}|
+----------------------------------------------------------------------+
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…