You need to specify an empty window if you want to get the maximum of count_trans
in df2
:
df2 = df.groupBy('id').count().toDF(*['id','count_trans'])
df3 = df2.selectExpr('*', 'count_trans / max(count_trans) over () as count_trans_norm')
Or if you prefer pyspark syntax:
from pyspark.sql import functions as F, Window
df3 = df2.withColumn('count_trans_norm', F.col('count_trans') / F.max(F.col('count_trans')).over(Window.orderBy()))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…