Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
254 views
in Technique[技术] by (71.8m points)

python - 使用keras平衡火车数据集(balancing train dataset using keras)

我已经收集了一些训练数据集来训练网络模型,但是毫无疑问,该数据集非常不平衡,因此可以使用Keras库平衡数据,而无需手动进行平衡(两个对象的数据集:对象1 2000数据,另一个是15000)

  ask by Hassan_mohammad translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are a number of ways and best-practices to deal with so called imbalanced data sets.

(有许多方法和最佳实践来处理所谓的不平衡数据集。)

  • Upsample the minority class (Drawback: possibly overfitting of minority class)

    (提升少数族裔的样本 (缺点:少数族裔可能过度拟合))

  • Downsample the majority class (Drawback: loss of training data, information loss)

    (降低大多数类别的采样率 (缺点:训练数据丢失,信息丢失))

There are a number of techniques you can use for this, some even offer methods to overcome drawbacks (eg synthetic sampling).

(您可以使用多种技术,甚至可以提供克服缺点的方法(例如,合成采样)。)

Have a look at the imbalanced-learn package for a easy-to-use implementation.

(查看imbalanced-learn软件包,该软件包易于使用。)

Another thing you could use is to weight the loss of your model in order to tell the model that it should "pay more attention" to specific classes.

(您可以使用的另一件事是权衡模型的损失,以告知模型它应“更加关注”特定的类。)

This can be easily done by defining the optional argument class_weight in keras fit function.

(通过在keras fit函数中定义可选参数class_weight可以轻松完成此操作。)

The class weights can be easily computed by sklearns compute_class_weight function.

(类别权重可以通过sklearns compute_class_weight函数轻松计算。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...