I would like to know how to apply gradient clipping in TensorFlow when distributed training.
Here's my code:
@lazy_property
def optimize(self):
# train_vars = ...
optimizer = tf.train.AdamOptimizer(self._learning_rate)
self.syn_op = tf.train.SyncReplicasOptimizer(optimizer,
replicas_to_aggregate=self.gradient_merge,
total_num_replicas=self.worker_count,
use_locking=True)
self.sync_replicas_hook = self.syn_op.make_session_run_hook(is_chief=self.is_chief)
return self.syn_op.minimize(self.cost, var_list=train_vars, global_step=self.global_step)
I've read this answer: How to apply gradient clipping in TensorFlow.
Here is the code of gradient clip in the answer:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)
Where should I change to use it in my condition?
question from:
https://stackoverflow.com/questions/65949836/how-to-apply-gradient-clipping-in-tensorflow-when-distributed-training 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…