I have been tweaking a lot these days to solve this problem.
Finally, I haven't solved the mystery of the memory size described in the question. I guess while computing the gradient tensoflow accumulate a lot of additional memory for computing gradient. I need to check the source of the tensorflow which seems very cumbersome at this time. You can check how much memory your model is using from terminal by the following command,
nvidia-smi
judging from this command you can guess how much additional memory you can use.
But the solution to these type of problem lies on reducing the batch size,
For my case reducing the size of the batch to 3 works. This may vary
model to model.
But what if you are using a model where the embedding matrix is much bigger that you cannot load them into memory?
The solution is to write some painy code.
You have to lookup on the embedding matrix and then load the embedding to the model. In short, for each batch, you have to give the lookup matrixes to the model(feed them by the feed_dict argument in the sess.run()
).
Next you will face a new problem,
You cannot make the embeddings trainable
in this way. The solution is to use the embedding in a placeholder
and assign them to a Variable
(say for example A
). After each batch of training, the learning algorithm updates the variable A
. Then compute the output of A
vector by tensorflow and assign them to your embedding matrix which is outside of the model. (I said that the process is painy)
Now your next question should be, what if you cannot feed the embedding lookup to the model because it's so big. This is a fundamental problem that you cannot avoid. That's why the NVIDIA GTX 1080, 1080ti and NVIDA TITAN Xp have so price difference though NVIDIA 1080ti and 1080 have the higher frequency to run an execution.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…