python - PySpark Numeric Window Group By

Question

Welcome To Ask or Share your Answers For Others

python - PySpark Numeric Window Group By

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - PySpark Numeric Window Group By

I'd like to be able to have Spark group by a step size, as opposed to just single values. Is there anything in spark similar to PySpark 2.x's window function for numeric (non-date) values?

Something along the lines of:

sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame([10, 11, 12, 13], "integer").toDF("foo")
res = df.groupBy(window("foo", step=2, start=10)).count()

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:03:31+0000

You can reuse timestamp one and express parameters in seconds. Tumbling:

from pyspark.sql.functions import col, window

df.withColumn(
    "window",
    window(
         col("foo").cast("timestamp"), 
         windowDuration="2 seconds"
    ).cast("struct<start:bigint,end:bigint>")
).show()

# +---+-------+              
# |foo| window|
# +---+-------+
# | 10|[10,12]|
# | 11|[10,12]|
# | 12|[12,14]|
# | 13|[12,14]|
# +---+-------+

Rolling one:

df.withColumn(
    "window", 
    window(
        col("foo").cast("timestamp"),
        windowDuration="2 seconds", slideDuration="1 seconds"
     ).cast("struct<start:bigint,end:bigint>")
).show()

# +---+-------+
# |foo| window|
# +---+-------+
# | 10| [9,11]|
# | 10|[10,12]|
# | 11|[10,12]|
# | 11|[11,13]|
# | 12|[11,13]|
# | 12|[12,14]|
# | 13|[12,14]|
# | 13|[13,15]|
# +---+-------+

Using groupBy and start:

w = window(col("foo").cast("timestamp"), "2 seconds").cast("struct<start:bigint,end:bigint>")
start = w.start.alias("start")
df.groupBy(start).count().show()

+-----+-----+
|start|count|
+-----+-----+
|   10|    2|
|   12|    2|
+-----+-----+

Categories

python - PySpark Numeric Window Group By

python - PySpark Numeric Window Group By

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags