python - Tensorflow: Multiple loss functions vs Multiple training ops

Question

Welcome To Ask or Share your Answers For Others

python - Tensorflow: Multiple loss functions vs Multiple training ops

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Tensorflow: Multiple loss functions vs Multiple training ops

I am creating a Tensorflow model which predicts multiple outputs (with different activations). I think there are two ways to do this:

Method 1: Create multiple loss functions (one for each output), merge them (using tf.reduce_mean or tf.reduce_sum) and pass it to the training op like so:

final_loss = tf.reduce_mean(loss1 + loss2)
train_op = tf.train.AdamOptimizer().minimize(final_loss)

Method 2: Create multiple training operations and then group them like so:

train_op1 = tf.train.AdamOptimizer().minimize(loss1)
train_op2 = tf.train.AdamOptimizer().minimize(loss2)
final_train_op = tf.group(train_op1 train_op2)

My question is whether one method is advantageous over the other? Is there a third method I don't know?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:28:27+0000

I want to make a subtle point that I don't think was made in previous answers.

If you were using something like GradientDescentOptimizer, these would be very similar operations. That's because taking gradients is a linear operation, and the gradient of a sum is the same as the sum of the gradients.

But, ADAM does something special: regardless of the scale of your loss, it scales the gradients so that they're always on the order of your learning rate. If you multiplied your loss by 1000, it wouldn't affect ADAM, because the change it would be normalized away.

So, if your two losses are roughly the same magnitude, then it shouldn't make a difference. If one is much larger than the other, then keep in mind that summing before the minimization will essentially ignore the small one, while making two ops will spend equal effort minimizing both.

I personally like dividing them up, which gives you more control over how much to focus on one loss or the other. For example, if it was multi-task learning, and one task was more important to get right than the other, two ops with different learning rates roughly accomplishes this.

Categories

python - Tensorflow: Multiple loss functions vs Multiple training ops

python - Tensorflow: Multiple loss functions vs Multiple training ops

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags