python - sklearn LogisticRegression and changing the default threshold for classification

Question

Welcome To Ask or Share your Answers For Others

python - sklearn LogisticRegression and changing the default threshold for classification

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - sklearn LogisticRegression and changing the default threshold for classification

I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the default threshold when creating predictions is 0.5. How can I change this default setting to find out what the accuracy is in my model when doing a 10-fold cross-validation? Basically, I want my model to predict a '1' for anyone greater than 0.25, not 0.5. I've been looking through all the documentation, and I can't seem to get anywhere.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:43:54+0000

I would like to give a practical answer

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_score

X, y = make_classification(
    n_classes=2, class_sep=1.5, weights=[0.9, 0.1],
    n_features=20, n_samples=1000, random_state=10
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = LogisticRegression(class_weight="balanced")
clf.fit(X_train, y_train)
THRESHOLD = 0.25
preds = np.where(clf.predict_proba(X_test)[:,1] > THRESHOLD, 1, 0)

pd.DataFrame(data=[accuracy_score(y_test, preds), recall_score(y_test, preds),
                   precision_score(y_test, preds), roc_auc_score(y_test, preds)], 
             index=["accuracy", "recall", "precision", "roc_auc_score"])

By changing the THRESHOLD to 0.25, one can find that recall and precision scores are decreasing. However, by removing the class_weight argument, the accuracy increases but the recall score falls down. Refer to the @accepted answer

Categories

python - sklearn LogisticRegression and changing the default threshold for classification

python - sklearn LogisticRegression and changing the default threshold for classification

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags