pyspark - unable to convert null value to 0

Question

Welcome To Ask or Share your Answers For Others

pyspark - unable to convert null value to 0

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

pyspark - unable to convert null value to 0

I'm working with databricks and I don't understand why I'm not able to convert null value to 0 in what it seems like a regular integer column.

I've tried these two options:

@udf(IntegerType())
def null_to_zero(x):
  """
  Helper function to transform Null values to zeros
  """
  return 0 if x == 'null' else x

and later:

.withColumn("col_test", null_to_zero(col("col")))

and everything is returned as null.

and the second option simply doesn't have any impact .na.fill(value=0,subset=["col"])

What do I'm missing here? Is this a specific behavior of null values with databricks?

question from:https://stackoverflow.com/questions/66055870/unable-to-convert-null-value-to-0

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:09:25+0000

The nulls are represented as None, not as a string null. For your case it's better to use coalesce function instead, like this (example based on docs):

from pyspark.sql.functions import coalesce, lit
cDf = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
cDf.withColumn("col_test", coalesce(cDf["a"], lit(0.0))).show()

will give you desired behavior:

+----+----+--------+
|   a|   b|col_test|
+----+----+--------+
|null|null|     0.0|
|   1|null|     1.0|
|null|   2|     0.0|
+----+----+--------+

If you need more complex logic, then you can use when/otherwise, with condition on null:

cDf.withColumn("col_test", when(cDf["a"].isNull(), lit(0.0)).otherwise(cDf["a"])).show()

Categories

pyspark - unable to convert null value to 0

pyspark - unable to convert null value to 0

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags