The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?
In R randomForest package there is
na.roughfix option : A completed data matrix or data frame. For numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains no NAs, it is returned unaltered.
in Pandas for numeric variables I can fill NaN values with :
df = df.fillna(df.median())
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…