Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
403 views
in Technique[技术] by (71.8m points)

python - DropoutWrapper being non-deterministic across runs?

In the beginning of my code, (outside the scope of a Session), I've set my random seed -

np.random.seed(1)
tf.set_random_seed(1)

This is what my dropout definition looks like -

cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=args.keep_prob, seed=1)

In my first experiment, I kept keep_prob=1. All results obtained were deterministic. I'm running this on a multicore CPU.

In my second experiment, I set keep_prob=0.8 and I ran the same code two times. Each code had these statements,

sess.run(model.cost, feed)
sess.run(model.cost, feed)

Results for first code run -

(Pdb) sess.run(model.cost, feed)
4.9555049
(Pdb) sess.run(model.cost, feed)
4.9548969

Expected behaviour, since DropoutWrapper uses random_uniform.

Results for second code run -

(Pdb) sess.run(model.cost, feed)
4.9551616
(Pdb) sess.run(model.cost, feed)
4.9552417

Why is this sequence not identical to the first output despite defining an operation and graph seed?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The answer was already provided in the comments, but no-one has written it explicitly yet, so here it is:

dynamic_rnn will internally use tf.while_loop, which can actually evaluate multiple iterations in parallel (see documentation on parallel_iterations). In practice, if everything inside the loop-body or loop-cond depends on the previous values, it cannot run anything in parallel but there could be computations which don't depend on the previous values. These will be evaluated in parallel. In your case, inside the DropoutWrapper, you have at some point sth like this:

random_ops.random_uniform(noise_shape, ...)

This operation is independent from the previous values of the loop, so it can be calculated in parallel for all time-steps. If you do such parallel execution, it will be non-deterministic which time-frame gets which dropout mask.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...