I am training the same model on two different machines, but the trained models are not identical. I have taken the following measures to ensure reproducibility:
# set random number
random.seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)
# set the cudnn
torch.backends.cudnn.benchmark=False
torch.backends.cudnn.deterministic=True
# set data loader work threads to be 0
DataLoader(dataset, num_works=0)
When I train the same model multiple times on the same machine, the trained model is always the same. However, the trained models on two different machines are not the same. Is this normal? Are there any other tricks I can employ?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…