Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
124 views
in Technique[技术] by (71.8m points)

python - Tensorflow model can only achieve good results on one computer, fails everywhere else

I have a TensorFlow/Keras model that I am training on a synthetic classification task.

When training the model using my laptop, the model achieves 99.9% accuracy and loss values around 1e-8.

However, when I train the model on a different machine, the accuracy plateaus at 80% and the loss is stuck at 3e-1. I have reproduced the failure on my own server and Google Colab.

Now, since the issue appears to be that my laptop is configured differently, I am trying to find out what this difference is.

I have made sure that on both machines:

  • Python version is 3.7
  • Nvidia driver is 460.x.x
  • CUDA version is 11.2
  • Tensorflow version is 2.4 and is installed from pip
  • Numpy version is 1.19.5
  • Scipy version is 1.4.1

The laptop has an i7-7700HQ and a NVIDIA GeForce GTX 1050 Mobile. The server has a Xeon Silver 4116 and several GPUs: TITAN Xp, TITAN V, GeForce RTX 2080 SUPER, TITAN V (I have tried all of them).

The problem happens both on CPU and GPU. Precision is set to float32 in all cases. The code that is being run is exactly the same.

I cannot share the code, but I can say that it uses tf.math.segment_sum which is a non-deterministic op (I don't know if it may help).

I am at a complete loss here. I have tried looking at every possible discrepancy between the two configurations, but I could not find any. The fact that this issue happens also on CPU is what really blows my mind.

What could the problem be?

I hope this qualifies as a programming question since it's related to TensorFlow specifically. If not, I apologize in advance and will ask elsewhere.

Thanks

question from:https://stackoverflow.com/questions/66050281/tensorflow-model-can-only-achieve-good-results-on-one-computer-fails-everywhere

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...