Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
887 views
in Technique[技术] by (71.8m points)

tensorflow - Choosing number of Steps per Epoch

If I want to train a model with train_generator, is there a significant difference between choosing

  • 10 Epochs with 500 Steps each

and

  • 100 Epochs with 50 Steps each

Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on what you said it sounds like you need a larger batch_size, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.

To solve for jumping-around

  • A larger batch size will give you a better gradient and will help to prevent jumping around
  • You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains

Implications of a larger batch-size

  • Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with.
  • Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function.

When to reduce epochs

  • If your train error is very low, yet your test/validation is very high, then you have over-fit the model with too many epochs.
  • The best way to find the right balance is to use early-stopping with a validation test set. Here you can specify when to stop training, and save the weights for the network that gives you the best validation loss. (I highly recommend using this always)

When to adjust steps-per-epoch

  • Traditionally, the steps per epoch is calculated as train_length // batch_size, since this will use all of the data points, one batch size worth at a time.
  • If you are augmenting the data, then you can stretch this a tad (sometimes I multiply that function above by 2 or 3 etc. But, if it's already training for too long, then I would just stick with the traditional approach.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...