I am trying to optimize the parameters of my algorithm with gridsearch, however, the results I get when I apply the optimized parameters are much lower than the ones resulted from grid search. I know that this could be because of the cross-validation on gridsearch. Is there any way to avoid this difference and receive approximately the same results on my predictions and the gridsearch?
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.33,
random_state=42)
gbc = GradientBoostingClassifier()
parameters = {'learning_rate':[0.01, 0.05, 0.1, 0.5, 1],
'min_samples_split':[2,5,10,20],
'max_depth':[2,3,5,10]}
clf = GridSearchCV(gbc, parameters, cv=3, scoring='f1')
clf.fit(X_train, y_train)
print("Best parameter (CV score=%0.3f):" % clf.best_score_)
# Best parameter (CV score=0.737)
gbc_tunned = gbc.set_params(**clf.best_params_)
gbc_tunned .fit(X_train,y_train.values.ravel())
ypred_test = lgbc_tunned .predict(X_test)
print(f1_score(y_test,ypred_test))
# 0.7008433734939759
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…