I think that most of the reasons for changing the thread_count
are not catboost specific. Other libraries like sklearn offer the same feature. Reasons for not running with all CPUs are:
- Debugging: If there is a problem it might be handy to only have one thread thus making the process more simple.
- You want other processes on your machine to have CPU power. Especially if you have a server for in-memory data analysis shared by a team of data scientists. Your colleagues won't be happy if you take all resources.
- Your job is so small that it simply does not need all the resources.
- Your parallelize in another way: For example you try different hyper parameters using cross validation. Then it would make sense to dedicate one CPU to training one model rather than training a model with with all CPUs and then move on to train the next model with all CPUs
I hope this answers question 1. This generalizes to other in-memory ml libraries like sklearn.
Regarding question 2 I'm not sure. CatBoost does the parallelisation somewhere in its C++ Code and uses it via Cython in the Python package. I assume it introduces some overhead (since distributed computing always introduces overhead) but it's probably not too much. You could find out by timing some experiments.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…