update: this question is related to Google Colab's "Notebook settings: Hardware accelerator: GPU". This question was written before the "TPU" option was added.
Reading multiple excited announcements about Google Colaboratory providing free Tesla K80 GPU, I tried to run fast.ai lesson on it for it to never complete - quickly running out of memory. I started investigating of why.
The bottom line is that “free Tesla K80” is not "free" for all - for some only a small slice of it is "free".
I connect to Google Colab from West Coast Canada and I get only 0.5GB of what supposed to be a 24GB GPU RAM. Other users get access to 11GB of GPU RAM.
Clearly 0.5GB GPU RAM is insufficient for most ML/DL work.
If you're not sure what you get, here is little debug function I scraped together (only works with the GPU setting of the notebook):
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
process = psutil.Process(os.getpid())
print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()
Executing it in a jupyter notebook before running any other code gives me:
Gen RAM Free: 11.6 GB | Proc size: 666.0 MB
GPU RAM Free: 566MB | Used: 10873MB | Util 95% | Total 11439MB
The lucky users who get access to the full card will see:
Gen RAM Free: 11.6 GB | Proc size: 666.0 MB
GPU RAM Free: 11439MB | Used: 0MB | Util 0% | Total 11439MB
Do you see any flaw in my calculation of the GPU RAM availability, borrowed from GPUtil?
Can you confirm that you get similar results if you run this code on Google Colab notebook?
If my calculations are correct, is there any way to get more of that GPU RAM on the free box?
update: I'm not sure why some of us get 1/20th of what other users get. e.g. the person who helped me to debug this is from India and he gets the whole thing!
note: please don't send any more suggestions on how to kill the potentially stuck/runaway/parallel notebooks that might be consuming parts of the GPU. No matter how you slice it, if you are in the same boat as I and were to run the debug code you'd see that you still get a total of 5% of GPU RAM (as of this update still).
See Question&Answers more detail:
os