Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
610 views
in Technique[技术] by (71.8m points)

tensorflow on GPU: no known devices, despite cuda's deviceQuery returning a "PASS" result

Note : this question was initially asked on github, but it was asked to be here instead

I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.

The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).

What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).

I have two graphical cards:

  • an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
  • a GTX 1060 that I want to dedicate to tensorflow

I use:

I've tried:

  • adding /usr/local/cuda/bin/ to $PATH
  • forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
  • whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
  • running the script with sudo (because why not)

Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.

At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!

Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb

Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.

If you run the following commands, you should be able to use the GPU in subsequent runs:

$ pip uninstall tensorflow
$ pip install tensorflow-gpu

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...