You need to cast the column low
to class date and then you can use datediff()
in combination with lit()
. Using Spark 2.2:
from pyspark.sql.functions import datediff, to_date, lit
df.withColumn("test",
datediff(to_date(lit("2017-05-02")),
to_date("low","yyyy/MM/dd"))).show()
+----------+----+------+-----+
| low|high|normal| test|
+----------+----+------+-----+
|1986/10/15| z| null|11157|
|1986/10/15| z| null|11157|
|1986/10/15| c| null|11157|
|1986/10/15|null| null|11157|
|1986/10/16|null| 4.0|11156|
+----------+----+------+-----+
Using < Spark 2.2, we need to convert the the low
column to class timestamp
first:
from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp
df.withColumn("test",
datediff(to_date(lit("2017-05-02")),
to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…