scala - Create new Dataframe with empty/null field values

Question

Welcome To Ask or Share your Answers For Others

scala - Create new Dataframe with empty/null field values

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - Create new Dataframe with empty/null field values

I am creating a new Dataframe from an existing dataframe, but need to add new column ("field1" in below code) in this new DF. How do I do so? Working sample code example will be appreciated.

val edwDf = omniDataFrame 
  .withColumn("field1", callUDF((value: String) => None)) 
  .withColumn("field2",
    callUdf("devicetypeUDF", (omniDataFrame.col("some_field_in_old_df")))) 

edwDf
  .select("field1", "field2")
  .save("odsoutdatafldr", "com.databricks.spark.csv");

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:52:17+0000

It is possible to use lit(null):

import org.apache.spark.sql.functions.{lit, udf}

case class Record(foo: Int, bar: String)
val df = Seq(Record(1, "foo"), Record(2, "bar")).toDF

val dfWithFoobar = df.withColumn("foobar", lit(null: String))

One problem here is that the column type is null:

scala> dfWithFoobar.printSchema
root
 |-- foo: integer (nullable = false)
 |-- bar: string (nullable = true)
 |-- foobar: null (nullable = true)

and it is not retained by the csv writer. If it is a hard requirement you can cast column to the specific type (lets say String), with either DataType

import org.apache.spark.sql.types.StringType

df.withColumn("foobar", lit(null).cast(StringType))

or string description

df.withColumn("foobar", lit(null).cast("string"))

or use an UDF like this:

val getNull = udf(() => None: Option[String]) // Or some other type

df.withColumn("foobar", getNull()).printSchema
root
 |-- foo: integer (nullable = false)
 |-- bar: string (nullable = true)
 |-- foobar: string (nullable = true)

A Python equivalent can be found here: Add an empty column to spark DataFrame

Categories

scala - Create new Dataframe with empty/null field values

scala - Create new Dataframe with empty/null field values

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags