Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
371 views
in Technique[技术] by (71.8m points)

optimization - CatBoost Machine Learning hyperparameters: why not always use `thread_count = -1`?

With respect specifically to CatBoost:

  1. Under what scenarios might one want to use fewer than the max number of threads of one's CPU? I cannot find an answer to this.
  2. Is there a fixed cost/overhead associated with each core utilized? I.e., is more always better for all data set types/sizes?

Do the answers to the questions above generalize to all machine learning algorithms?

question from:https://stackoverflow.com/questions/65932060/catboost-machine-learning-hyperparameters-why-not-always-use-thread-count-1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think that most of the reasons for changing the thread_count are not catboost specific. Other libraries like sklearn offer the same feature. Reasons for not running with all CPUs are:

  • Debugging: If there is a problem it might be handy to only have one thread thus making the process more simple.
  • You want other processes on your machine to have CPU power. Especially if you have a server for in-memory data analysis shared by a team of data scientists. Your colleagues won't be happy if you take all resources.
  • Your job is so small that it simply does not need all the resources.
  • Your parallelize in another way: For example you try different hyper parameters using cross validation. Then it would make sense to dedicate one CPU to training one model rather than training a model with with all CPUs and then move on to train the next model with all CPUs

I hope this answers question 1. This generalizes to other in-memory ml libraries like sklearn.

Regarding question 2 I'm not sure. CatBoost does the parallelisation somewhere in its C++ Code and uses it via Cython in the Python package. I assume it introduces some overhead (since distributed computing always introduces overhead) but it's probably not too much. You could find out by timing some experiments.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...