You can pass sample weights argument to Random Forest fit method
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted. Splits
that would create child nodes with net zero or negative weight are
ignored while searching for a split in each node. In the case of
classification, splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
In older version there were a preprocessing.balance_weights
method to generate balance weights for given samples, such that classes become uniformly distributed. It is still there, in internal but still usable preprocessing._weights module, but is deprecated and will be removed in future versions. Don't know exact reasons for this.
Update
Some clarification, as you seems to be confused. sample_weight
usage is straightforward, once you remember that its purpose is to balance target classes in training dataset. That is, if you have X
as observations and y
as classes (labels), then len(X) == len(y) == len(sample_wight)
, and each element of sample witght
1-d array represent weight for a corresponding (observation, label)
pair. For your case, if 1
class is represented 5 times as 0
class is, and you balance classes distributions, you could use simple
sample_weight = np.array([5 if i == 0 else 1 for i in y])
assigning weight of 5
to all 0
instances and weight of 1
to all 1
instances. See link above for a bit more crafty balance_weights
weights evaluation function.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…