I'm using SparkSQL on pyspark to store some PostgreSQL tables into DataFrames and then build a query that generates several time series based on a start
and stop
columns of type date
.
Suppose that my_table
contains:
start | stop
-------------------------
2000-01-01 | 2000-01-05
2012-03-20 | 2012-03-23
In PostgreSQL it's very easy to do that:
SELECT generate_series(start, stop, '1 day'::interval)::date AS dt FROM my_table
and it will generate this table:
dt
------------
2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2012-03-20
2012-03-21
2012-03-22
2012-03-23
but how to do that using plain SparkSQL? Will it be necessary to use UDFs or some DataFrame methods?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…