python - sklearn pipeline - how to apply different transformations on different columns

Question

Welcome To Ask or Share your Answers For Others

python - sklearn pipeline - how to apply different transformations on different columns

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - sklearn pipeline - how to apply different transformations on different columns

I am pretty new to pipelines in sklearn and I am running into this problem: I have a dataset that has a mixture of text and numbers i.e. certain columns have text only and rest have integers (or floating point numbers).

I was wondering if it was possible to build a pipeline where I can for example call LabelEncoder() on the text features and MinMaxScaler() on the numbers columns. The examples I have seen on the web mostly point towards using LabelEncoder() on the entire dataset and not on select columns. Is this possible? If so any pointers would be greatly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:46:14+0000

The way I usually do it is with a FeatureUnion, using a FunctionTransformer to pull out the relevant columns.

Important notes:

You have to define your functions with def since annoyingly you can't use lambda or partial in FunctionTransformer if you want to pickle your model
You need to initialize FunctionTransformer with validate=False

Something like this:

from sklearn.pipeline import make_union, make_pipeline
from sklearn.preprocessing import FunctionTransformer

def get_text_cols(df):
    return df[['name', 'fruit']]

def get_num_cols(df):
    return df[['height','age']]

vec = make_union(*[
    make_pipeline(FunctionTransformer(get_text_cols, validate=False), LabelEncoder()))),
    make_pipeline(FunctionTransformer(get_num_cols, validate=False), MinMaxScaler())))
])

Categories

python - sklearn pipeline - how to apply different transformations on different columns

python - sklearn pipeline - how to apply different transformations on different columns

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags