I have pandas DF as below ,
id age gender country sales_year
1 None M India 2016
2 23 F India 2016
1 20 M India 2015
2 25 F India 2015
3 30 M India 2019
4 36 None India 2019
I want to group by on id, take the latest 1 row as per sales_date with all non null element.
output expected,
id age gender country sales_year
1 20 M India 2016
2 23 F India 2016
3 30 M India 2019
4 36 None India 2019
In pyspark,
df = df.withColumn('age', f.first('age', True).over(Window.partitionBy("id").orderBy(df.sales_year.desc())))
But i need same solution in pandas .
EDIT ::
This can the case with all the columns. Not just age. I need it to pick up latest non null data(id exist) for all the ids.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…