I've been using sklearn's random forest, and I've tried to compare several models. Then I noticed that random-forest is giving different results even with the same seed. I tried it both ways: random.seed(1234) as well as use random forest built-in random_state = 1234
In both cases, I get non-repeatable results. What have I missed...?
# 1
random.seed(1234)
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10)
# or 2
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10, random_state=1234)
Any ideas? Thanks!!
EDIT:
Adding a more complete version of my code
clf = RandomForestClassifier(max_depth=60, max_features=60,
criterion='entropy',
min_samples_leaf = 3, random_state=seed)
# As describe, I tried random_state in several ways, still diff results
clf = clf.fit(X_train, y_train)
predicted = clf.predict(X_test)
predicted_prob = clf.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(np.array(y_test), predicted_prob)
auc = metrics.auc(fpr,tpr)
print (auc)
EDIT: It's been quite a while, but I think using RandomState might solve the problem. I didn't test it yet myself, but if you're reading it, it's worth a shot. Also, it is generally preferable to use RandomState instead of random.seed().
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…