python - Classification results depend on random_state?

Question

Welcome To Ask or Share your Answers For Others

python - Classification results depend on random_state?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Classification results depend on random_state?

I want to implement a AdaBoost model using scikit-learn (sklearn). My question is similar to another question but it is not totally the same. As far as I understand, the random_state variable described in the documentation is for randomly splitting the training and testing sets, according to the previous link. So if I understand correctly, my classification results should not be dependent on the seeds, is it correct? Should I be worried if my classification results turn out to be dependent on the random_state variable?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:31+0000

Your classification scores will depend on random_state. As @Ujjwal rightly said, it is used for splitting the data into training and test test. Not just that, a lot of algorithms in scikit-learn use the random_state to select the subset of features, subsets of samples, and determine the initial weights etc.

For eg.

Tree based estimators will use the random_state for random selections of features and samples (like DecisionTreeClassifier, RandomForestClassifier).
In clustering estimators like Kmeans, random_state is used to initialize centers of clusters.
SVMs use it for initial probability estimation
Some feature selection algorithms also use it for initial selection
And many more...

Its mentioned in the documentation that:

If your code relies on a random number generator, it should never use functions like numpy.random.random or numpy.random.normal. This approach can lead to repeatability issues in tests. Instead, a numpy.random.RandomState object should be used, which is built from a random_state argument passed to the class or function.

Do read the following questions and answers for better understanding:

Categories

python - Classification results depend on random_state?

python - Classification results depend on random_state?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags