I'm writing a c++ wrapper around tensorflow 1.2 C API (for inference purposes, if it matters).
Since my application is a multi-process and multi-threaded one, where resources are explicitly allocated, I would like to limit Tensorflow to only use one thread.
Currently, running a simple inference test that allows batch processing, I see it is using all CPU cores.
I have tried limiting number of threads for a new session using a mixture of C and C++ as follows (forgive my partial code snippet, I hope this makes sense):
tensorflow::ConfigProto conf;
conf.set_intra_op_parallelism_threads(1);
conf.set_inter_op_parallelism_threads(1);
conf.add_session_inter_op_thread_pool()->set_num_threads(1);
std::string str;
conf.SerializeToString(&str);
TF_SetConfig(m_session_opts,(void *)str.c_str(),str.size(),m_status);
m_session = TF_NewSession(m_graph, m_session_opts, m_status);
But I don't see it is making any difference - all cores are still fully utilized.
Am I using the C API correctly?
(My current work around is to recompile Tensorflow with hard coding number of threads to be 1, which will probably work, but its obviously not the best approach...)
-- Update --
I also tried adding:
conf.set_use_per_session_threads(true);
Without success. Still multiple cores are used...
I also tried to run with high log verbosity, and got this output (showing only what I think is relevant):
tensorflow/core/common_runtime/local_device.cc:40] Local device intraop parallelism threads: 8
tensorflow/core/common_runtime/session_factory.cc:75] SessionFactory type DIRECT_SESSION accepts target:
tensorflow/core/common_runtime/direct_session.cc:95] Direct session inter op parallelism threads for pool 0: 1
The "parallelism threads: 8" message shows up as soon as I instantiate a new graph using TF_NewGraph(). I didn't find a way to specify options prior to this graph allocation though...
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…