Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
193 views
in Technique[技术] by (71.8m points)

pyspark - unable to convert null value to 0

I'm working with databricks and I don't understand why I'm not able to convert null value to 0 in what it seems like a regular integer column.

I've tried these two options:

@udf(IntegerType())
def null_to_zero(x):
  """
  Helper function to transform Null values to zeros
  """
  return 0 if x == 'null' else x

and later:

.withColumn("col_test", null_to_zero(col("col")))

and everything is returned as null.

and the second option simply doesn't have any impact .na.fill(value=0,subset=["col"])

What do I'm missing here? Is this a specific behavior of null values with databricks?

question from:https://stackoverflow.com/questions/66055870/unable-to-convert-null-value-to-0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The nulls are represented as None, not as a string null. For your case it's better to use coalesce function instead, like this (example based on docs):

from pyspark.sql.functions import coalesce, lit
cDf = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
cDf.withColumn("col_test", coalesce(cDf["a"], lit(0.0))).show()

will give you desired behavior:

+----+----+--------+
|   a|   b|col_test|
+----+----+--------+
|null|null|     0.0|
|   1|null|     1.0|
|null|   2|     0.0|
+----+----+--------+

If you need more complex logic, then you can use when/otherwise, with condition on null:

cDf.withColumn("col_test", when(cDf["a"].isNull(), lit(0.0)).otherwise(cDf["a"])).show()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...