Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
881 views
in Technique[技术] by (71.8m points)

cross validation - How to correctly validate a machine learning model?

I'm getting confused when it comes to model validation.

What I've done for 6 different algorithms:

-->separated my dataset 75/25 (training/test) --> the test I left untouched.

-->with the training set I did the following:

  1. splited in 4-folds (outer) and performed a nested repeated (five times) tenfold (inner) cross-validation. With hiperparameter tuning by random search 10 times. (leave one out strategy)
  2. extracted the metrics (ROC curves, acc, specificity, etc) and got the parameters of the best model.

Now this is the problem:

I still have an untouched test set (from the split in the beginning), what should I do with it? Apply directly to the best model and see the performance? or retrain the best model with the best parameters using the whole training set and then apply the test set?

Or is everything wrong here?

question from:https://stackoverflow.com/questions/65649231/how-to-correctly-validate-a-machine-learning-model

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You got it. This is the general rule:

  • Pick the best model out of the 6. I see that K is 4 in your case, recommended is 10 or 20.
  • Retrain the best model again on all parts together (Entire training set). (Trust is already established from cross validation).
  • Predict on test set.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...