Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
200 views
in Technique[技术] by (71.8m points)

python - Asynchronous computation in TensorFlow

Recently I've been toying with TensorFlow and I mentioned that the framework is not able to use all my available computational resources. In Convolutional Neural Networks tutorial they mention that

Naively employing asynchronous updates of model parameters leads to sub-optimal training performance because an individual model replica might be trained on a stale copy of the model parameters. Conversely, employing fully synchronous updates will be as slow as the slowest model replica.

Although they mention it in both in the tutorial and in a whitepaper I did not really find a way to do the asynchronous parallel computation on a local machine. Is it even possible? Or is it part of the distributed to-be-released version of TensorFlow. If it is, then how?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Asynchronous gradient descent is supported in the open-source release of TensorFlow, without even modifying your graph. The easiest way to do it is to execute multiple concurrent steps in parallel:

loss = ...

# Any of the optimizer classes can be used here.
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

sess = tf.Session()
sess.run(tf.initialize_all_variables())

def train_function():
  # TODO: Better termination condition, e.g. using a `max_steps` counter.
  while True:
    sess.run(train_op)

# Create multiple threads to run `train_function()` in parallel
train_threads = []
for _ in range(NUM_CONCURRENT_STEPS):
  train_threads.append(threading.Thread(target=train_function))

# Start the threads, and block on their completion.
for t in train_threads:
  t.start()
for t in train_threads:
  t.join()

This example sets up NUM_CONCURRENT_STEPS calls to sess.run(train_op). Since there is no coordination between these threads, they proceed asynchronously.

It's actually more challenging to achieve synchronous parallel training (at present), because this requires additional coordination to ensure that all replicas read the same version of the parameters, and that all of their updates become visible at the same time. The multi-GPU example for CIFAR-10 training performs synchronous updates by making multiple copies of the "tower" in the training graph with shared parameters, and explicitly averaging the gradients across the towers before applying the update.


N.B. The code in this answer places all computation on the same device, which will not be optimal if you have multiple GPUs in your machine. If you want to use all of your GPUs, follow the example of the multi-GPU CIFAR-10 model, and create multiple "towers" with their operations pinned to each GPU. The code would look roughly as follows:

train_ops = []

for i in range(NUM_GPUS):
  with tf.device("/gpu:%d" % i):
    # Define a tower on GPU `i`.
    loss = ...

    train_ops.append(tf.train.GradientDescentOptimizer(0.01).minimize(loss))

def train_function(train_op):
  # TODO: Better termination condition, e.g. using a `max_steps` counter.
  while True:
    sess.run(train_op)


# Create multiple threads to run `train_function()` in parallel
train_threads = []
for train_op in train_ops:
  train_threads.append(threading.Thread(target=train_function, args=(train_op,))


# Start the threads, and block on their completion.
for t in train_threads:
  t.start()
for t in train_threads:
  t.join()

Note that you might find it convenient to use a "variable scope" to facilitate variable sharing between the towers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...