The random forest estimators with one estimator isn't just a decision tree?
Well, this is a good question, and the answer turns out to be no; the Random Forest algorithm is more than a simple bag of individually-grown decision trees.
Apart from the randomness induced from ensembling many trees, the Random Forest (RF) algorithm also incorporates randomness when building individual trees in two distinct ways, none of which is present in the simple Decision Tree (DT) algorithm.
The first is the number of features to consider when looking for the best split at each tree node: while DT considers all the features, RF considers a random subset of them, of size equal to the parameter max_features
(see the docs).
The second is that, while DT considers the whole training set, a single RF tree considers only a bootstrapped sub-sample of it; from the docs again:
The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).
The RF algorihm is essentially the combination of two independent ideas: bagging, and random selection of features (see the Wikipedia entry for a nice overview). Bagging is essentially my second point above, but applied to an ensemble; random selection of features is my first point above, and it seems that it had been independently proposed by Tin Kam Ho before Breiman's RF (again, see the Wikipedia entry). Ho had already suggested that random feature selection alone improves performance. This is not exactly what you have done here (you still use the bootstrap sampling idea from bagging, too), but you could easily replicate Ho's idea by setting bootstrap=False
in your RandomForestClassifier()
arguments. The fact is that, given this research, the difference in performance is not unexpected...
To replicate exactly the behaviour of a single tree in RandomForestClassifier()
, you should use both bootstrap=False
and max_features=None
arguments, i.e.
clf = RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)
in which case neither bootstrap sampling nor random feature selection will take place, and the performance should be roughly equal to that of a single decision tree.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…