Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
96 views
in Technique[技术] by (71.8m points)

python - CUDA_ERROR_OUT_OF_MEMORY: out of memory (NOT DURING TRAINING)

I have been using Tensorflow 2.3.0 for quite some time with cuda 10.1 and CUDNN 7.6.5 on Windows 10.

Driver API nvidia-smi
Thu Jan  7 15:50:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.09       Driver Version: 461.09       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P8     8W /  N/A |     92MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Runtime API nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
GPU: NVIDIA GeForce GTX 1060 with Max-Q Design 

I have been able to train Tensorflow models and run inference just fine. A few days back I am getting a "CUDA_ERROR_OUT_OF_MEMORY: out of memory" for just running inference on models which I could run inference on before. The code that runs inference has not changed either. Could there be some other process that is now filling the CUDA memory? I have already tried removing CUDA and cuDNN and reinstalling it.

Here are the log of the error when I run inference

I also ran cuda-memcheck to check if there were any leaks or not.

Here are the logs of cuda-memcheck --leak-check full

Any help is much appreciated!

question from:https://stackoverflow.com/questions/65661261/cuda-error-out-of-memory-out-of-memory-not-during-training

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...