I have data frame which has some groups and I want to fill the missing values based on last previous available and next value available average of score column i.e. (previous value+next value)/2.
I want to group by state,school,class,subject and then fill value.
If the first value not available in score column then fill the value with value which is available next or
If the last value not available then fill the value with value which is available previously
for each group this needs to be followed.
It is data imputation complex problem. I searched online and found pandas has some functionality i.e.
pandas.core.groupby.DataFrameGroupBy.ffill but dont know how to use in this case.
I am thinking to solve in python,pyspark,SQL !
My data frame looks like this
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…