python - scikit-learn .predict() default threshold

Question

Welcome To Ask or Share your Answers For Others

python - scikit-learn .predict() default threshold

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - scikit-learn .predict() default threshold

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability.

In a binary classification problem, is scikit's classifier.predict() using 0.5 by default? If it doesn't, what's the default method? If it does, how do I change it?

In scikit some classifiers have the class_weight='auto' option, but not all do. With class_weight='auto', would .predict() use the actual population proportion as a threshold?

What would be the way to do this in a classifier like MultinomialNB that doesn't support class_weight? Other than using predict_proba() and then calculation the classes myself.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:56:13+0000

The threshold can be set using clf.predict_proba()

for example:

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state = 2)
clf.fit(X_train,y_train)
# y_pred = clf.predict(X_test)  # default threshold is 0.5
y_pred = (clf.predict_proba(X_test)[:,1] >= 0.3).astype(bool) # set threshold as 0.3

Categories

python - scikit-learn .predict() default threshold

python - scikit-learn .predict() default threshold

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags