The nulls are represented as None
, not as a string null
. For your case it's better to use coalesce function instead, like this (example based on docs):
from pyspark.sql.functions import coalesce, lit
cDf = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
cDf.withColumn("col_test", coalesce(cDf["a"], lit(0.0))).show()
will give you desired behavior:
+----+----+--------+
| a| b|col_test|
+----+----+--------+
|null|null| 0.0|
| 1|null| 1.0|
|null| 2| 0.0|
+----+----+--------+
If you need more complex logic, then you can use when/otherwise, with condition on null:
cDf.withColumn("col_test", when(cDf["a"].isNull(), lit(0.0)).otherwise(cDf["a"])).show()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…