There's a good chance you could get deterministic results if you run your network on CPU (export CUDA_VISIBLE_DEVICES=
), with single-thread in Eigen thread pool (tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=1)
), one Python thread (no multi-threaded queue-runners that you get from ops like tf.batch
), and a single well-defined operation order. Also using inter_op_parallelism_threads=1
may help in some scenarios.
One issue is that floating point addition/multiplication is non-associative, so one fool-proof way to get deterministic results is to use integer arithmetic or quantized values.
Barring that, you could isolate which operation is non-deterministic, and try to avoid using that op. For instance, there's tf.add_n
op, which doesn't say anything about the order in which it sums the values, but different orders produce different results.
Getting deterministic results is a bit of an uphill battle because determinism is in conflict with performance, and performance is usually the goal that gets more attention. An alternative to trying to have exact same numbers on reruns is to focus on numerical stability -- if your algorithm is stable, then you will get reproducible results (ie, same number of misclassifications) even though exact parameter values may be slightly different
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…