The way I usually do it is with a FeatureUnion
, using a FunctionTransformer
to pull out the relevant columns.
Important notes:
You have to define your functions with def
since annoyingly you can't use lambda
or partial
in FunctionTransformer if you want to pickle your model
You need to initialize FunctionTransformer
with validate=False
Something like this:
from sklearn.pipeline import make_union, make_pipeline
from sklearn.preprocessing import FunctionTransformer
def get_text_cols(df):
return df[['name', 'fruit']]
def get_num_cols(df):
return df[['height','age']]
vec = make_union(*[
make_pipeline(FunctionTransformer(get_text_cols, validate=False), LabelEncoder()))),
make_pipeline(FunctionTransformer(get_num_cols, validate=False), MinMaxScaler())))
])
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…