There are a number of ways and best-practices to deal with so called imbalanced data sets.
(有许多方法和最佳实践来处理所谓的不平衡数据集。)
There are a number of techniques you can use for this, some even offer methods to overcome drawbacks (eg synthetic sampling).
(您可以使用多种技术,甚至可以提供克服缺点的方法(例如,合成采样)。)
Have a look at the imbalanced-learn
package for a easy-to-use implementation. (查看imbalanced-learn
软件包,该软件包易于使用。)
Another thing you could use is to weight the loss of your model in order to tell the model that it should "pay more attention" to specific classes.
(您可以使用的另一件事是权衡模型的损失,以告知模型它应“更加关注”特定的类。)
This can be easily done by defining the optional argument class_weight
in keras fit
function. (通过在keras fit
函数中定义可选参数class_weight
可以轻松完成此操作。)
The class weights can be easily computed by sklearns compute_class_weight
function. (类别权重可以通过sklearns compute_class_weight
函数轻松计算。)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…