I am training a model using the TF keras API, the issue I am having is that I am unable to maximise the usage of the GPU, it is under-utilised in both memory & processing.
When profiling the model, I can see a lot of operations labelled as _Send
which I assume is some data hopping between GPU & CPU.
Since I am using keras, I am not directly placing variables on device so I am not clear on why this is occuring or how to optimise.
Another interesting side effect seems to be that larger batches make training slower, with huge long waits for the GPU to get data from the CPU.
The profiler also suggests:
59.4 % of the total step time sampled is spent on 'Kernel Launch'. It could be due to CPU contention with tf.data. In this case, you may try to set the environment variable TF_GPU_THREAD_MODE=gpu_private.
I have set this env var at the top of the notebook, with no effect - I am not clear on how to check if it is having the intended effect.
Your help here would be greatly appreciated, I have read all the available guides on the tensorflow docs.
question from:
https://stackoverflow.com/questions/65875612/tensorflow-gpu-profiling 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…