Let me give you a detailed explanation of what is going on.
Calling model.evaluate
(or model.test_on_batch
) will invoke the model.make_test_function
which will invoke the model.test_step
and this function does following:
y_pred = self(x, training=False)
# Updates stateful loss metrics.
self.compiled_loss(
y, y_pred, sample_weight, regularization_losses=self.losses)
Calling model.train_on_batch
will invoke the model.make_train_function
which will invoke the model.train_step
and this function does following:
with backprop.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, sample_weight, regularization_losses=self.losses)
As you can see from above source code, the only difference between model.test_step
and model.train_step
when compute the loss is whether training=True
when forward pass data to model.
Because some neural network layers behave differently during training and inference (e.g Dropout and BatchNormalization layers), so we have training
argument for let those layer know which of the two "paths" it should take, e.g:
During training, dropout will randomly drop out units and correspondingly scale up activations of the remaining units.
During inference, it does nothing (since you usually don't want the randomness of dropping out units here).
Since you have dropout layer in your model, so the loss increase in training mode is expected.
If you remove the line layers.Dropout(0.5),
when define the model you will see the loss is nearly identical (i.e with little floating point precision mismatch), e.g outputs of three epoch:
Epoch: 1
Pre batch train loss : 1.6852061748504639
train_on_batch loss : 1.6852061748504639
Post batch train loss : 1.6012675762176514
Pre batch train loss : 1.7325702905654907
train_on_batch loss : 1.7325704097747803
Post batch train loss : 1.6512296199798584
Epoch: 2
Pre batch train loss : 1.5149778127670288
train_on_batch loss : 1.5149779319763184
Post batch train loss : 1.4209072589874268
Pre batch train loss : 1.567994475364685
train_on_batch loss : 1.5679945945739746
Post batch train loss : 1.4767804145812988
Epoch: 3
Pre batch train loss : 1.3269715309143066
train_on_batch loss : 1.3269715309143066
Post batch train loss : 1.2274967432022095
Pre batch train loss : 1.3868262767791748
train_on_batch loss : 1.3868262767791748
Post batch train loss : 1.2916004657745361
Reference:
Documents and source code link of tf.keras.Model
What does training=True
mean when calling a TensorFlow Keras model?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…