Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
231 views
in Technique[技术] by (71.8m points)

python - Nan losses using "Learning Rate Step Decay" Scheduler with Adam Optimizer in Keras?

I have this very deep model:

def get_model2(mask_kind):

decay = 0.0

    inp_1 = keras.Input(shape=(64, 101, 1), name="RST_inputs")
    x = layers.Conv2D(256, kernel_size=(3, 3), kernel_regularizer=l2(1e-6), strides=(3, 3), padding="same")(inp_1)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(128, kernel_size=(3, 3), kernel_regularizer=l2(1e-6), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(64, kernel_size=(2, 2), kernel_regularizer=l2(1e-6), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(32, kernel_size=(2, 2), kernel_regularizer=l2(1e-6), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(512)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(256)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    out1 = layers.Dense(128, name="ls_weights")(x)

    if mask_kind == 1:  # APPLICA LA PRIMA MASCHERA
        binary_mask = layers.Lambda(mask_layer1, name="lambda_layer1", dtype='float64')(out1)
        print('shape', binary_mask.shape[0])
    elif mask_kind == 2:  # APPLICA LA SECONDA MASCHERA
        binary_mask = layers.Lambda(mask_layer2, name="lambda_layer2", dtype='float64')(out1)
    else:  # NON APPLICA NULLA
        binary_mask = out1

    x = layers.Dense(256)(binary_mask)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(512)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(192)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Reshape((2, 2, 48))(x)
    x = layers.Conv2DTranspose(32, kernel_size=(2, 2), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(64, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(128, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(256, kernel_size=(3, 3), strides=(5, 5), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    soundfield_layer = layers.Conv2DTranspose(1, kernel_size=(1, 1), strides=(1, 1), padding='same')(x)
    # soundfield_layer = layers.Dense(40000, name="sf_vec")(x)

    if mask_kind == 1:
        model = keras.Model(inp_1, [binary_mask, soundfield_layer], name="2_out_model")
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1, decay=decay),  # in caso
                      # rimettere 0.001
                      loss=["mse", "mse"], loss_weights=[1, 1])
        # plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
        model.summary()

    else:
        model = keras.Model(inp_1, [binary_mask, soundfield_layer], name="2_out_model")
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1, decay=decay),  # in caso
                      # rimettere 0.001
                      loss=["mse", "mse"], loss_weights=[0, 1])
        # plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
        model.summary()

    return model

and I'm trying to use Learning rate Step Decay to see if I can improve my validation loss function during training. I'm defining the class for the scheduler as follows:

class StepDecay:
    def __init__(self, initAlpha=0.1, factor=0.25, dropEvery=30):
        # store the base initial learning rate, drop factor, and
        # epochs to drop every
        self.initAlpha = initAlpha
        self.factor = factor
        self.dropEvery = dropEvery
    
    def __call__(self, epoch):
        # compute the learning rate for the current epoch
        exp = np.floor((1 + epoch) / self.dropEvery)
        alpha = self.initAlpha * (self.factor ** exp)
        # return the learning rate
        return float(alpha)

and then I run my training:

schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=30)
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
callbacks = [es, LearningRateScheduler(schedule)]

model = get_model2(mask_kind=1)

history = model.fit(X_train, [Y_train, Z_train], validation_data=(X_val, [Y_val, Z_val]), epochs=300,
                    batch_size=32,
                    callbacks=callbacks, verbose=1)

test_loss, _, _ = model.evaluate(X_test, [Y_test, Z_test], verbose=1)
print('Test: %.3f' % test_loss)

but when I train I get "nan" losses:

25/25 [==============================] - 17s 684ms/step - loss: nan - lambda_layer1_loss: nan - conv2d_transpose_4_loss: nan - val_loss: nan - val_lambda_layer1_loss: nan etc....

and I don't understand why. The problem could be the decay rate which is a parameter present in the SGD optimizer but that from the documentation does not exists for Adam, but I get no error that so..any ideas?

question from:https://stackoverflow.com/questions/65922990/nan-losses-using-learning-rate-step-decay-scheduler-with-adam-optimizer-in-ker

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can play with the parameters to find a good balance, but this is one way to use exponential decay as a callback function with the Adam optimizer.

LR_MAX = 0.0001
LR_MIN = 0.00001
LR_EXP_DECAY = 0.85

def lrfn(epoch):
    lr = (LR_MAX - LR_MIN) * LR_EXP_DECAY**(epoch) + LR_MIN
    return lr

    
lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = True)

Simply define the callback like in the following example.

model.fit(..
          ..
          callbacks = [lr_callback],
          ..
          ..)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...