Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
851 views
in Technique[技术] by (71.8m points)

group by - How to write a function with multiple agg like sum, first, collect_set, count_distinct in pyspark?

data = data.groupby(columns_for_groupby).agg(first('AGE').alias('AGE'),
    first('WEIGHT').alias('WEIGHT'),
    first('MOBILE').alias('MOBILE'),
    sum('HEIGHT').alias('HEIGHT_SUM'),
    collect_set('WORK_EXP').alias('WORK_EXP_LIST'),
    F.countDistinct('PLANT').alias('PLANT_COUNT'),
    first('DATE').alias('DATE'))

So I have code written in this format, I have many features that fall under first, sum, collect_set, F.countDistinct respectively. I want to write a function that will take list of first, sum, collect, distinct variables and pass a dataframe with the respective groupby along with respective renaming. I am pretty new to pyspark, any help would be appreciated. Thanks


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...