I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score
and cross_val_predict
I should use.
One option would be :
cvs = DecisionTreeRegressor(max_depth = depth)
scores = cross_val_score(cvs, predictors, target, cv=cvfolds, scoring='r2')
print("R2-Score: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
An other one, to use the cv-predictions with the standard r2_score
:
cvp = DecisionTreeRegressor(max_depth = depth)
predictions = cross_val_predict(cvp, predictors, target, cv=cvfolds)
print ("CV R^2-Score: {}".format(r2_score(df[target], predictions_cv)))
I would assume that both methods are valid and give similar results. But that is only the case with small k-folds. While the r^2 is roughly the same for 10-fold-cv, it gets increasingly lower for higher k-values in the case of the first version using "cross_vall_score". The second version is mostly unaffected by changing numbers of folds.
Is this behavior to be expected and do I lack some understanding regarding CV in SKLearn?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…