I'm working with a really simple dataset. It has some missing values, both in categorical and numeric features. Because of this, I'm trying to use sklearn.preprocessing.KNNImpute to get the most accurate imputation I can. However, when I run the following code:
imputer = KNNImputer(n_neighbors=120)
imputer.fit_transform(x_train)
I get the error: ValueError: could not convert string to float: 'Private'
That makes sense, it obviously can't handle categorical data. But when I try to run OneHotEncoder with:
encoder = OneHotEncoder(drop="first")
encoder.fit_transform(x_train[categorical_features])
It throws the error: ValueError: Input contains NaN
I'd prefer to use KNNImpute
even with the categorical data as I feel like I'd be losing some accuracy if I just use a ColumnTransform
and impute with numeric and categorical data seperately. Is there any way to get OneHotEncoder
to ignore these missing values? If not, is using ColumnTransform
or a simpler imputer a better way of tackling this problem?
Thanks in advance
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…