python - Imputation of missing values for categories in pandas

Question

Welcome To Ask or Share your Answers For Others

python - Imputation of missing values for categories in pandas

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Imputation of missing values for categories in pandas

The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?

In R randomForest package there is na.roughfix option : A completed data matrix or data frame. For numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains no NAs, it is returned unaltered.

in Pandas for numeric variables I can fill NaN values with :

df = df.fillna(df.median())

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:45:55+0000

You can use df = df.fillna(df['Label'].value_counts().index[0]) to fill NaNs with the most frequent value from one column.

If you want to fill every column with its own most frequent value you can use

df = df.apply(lambda x:x.fillna(x.value_counts().index[0]))

UPDATE 2018-25-10 ?

Starting from 0.13.1 pandas includes mode method for Series and Dataframes. You can use it to fill missing values for each column (using its own most frequent value) like this

df = df.fillna(df.mode().iloc[0])

Categories

python - Imputation of missing values for categories in pandas

python - Imputation of missing values for categories in pandas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags