I've been looking for ways to easily multithread some of my simple analysis code since I had noticed numpy it was only using one core, despite the fact that it is supposed to be multithreaded.
Who says it's supposed to be multithreaded?
numpy
is primarily designed to be as fast as possible on a single core, and to be as parallelizable as possible if you need to do so. But you still have to parallelize it.
In particular, you can operate on independent sub-objects at the same time, and slow operations release the GIL when possible—although "when possible" may not be nearly enough. Also, numpy
objects are designed to be shared or passed between processes as easily as possible, to facilitate using multiprocessing
.
There are some specialized methods that are automatically parallelized, but most of the core methods are not. In particular, dot
is implemented on top of BLAS when possible, and BLAS is automatically parallelized on most platforms, but mean
is implemented in plain C code.
See Parallel Programming with numpy and scipy for details.
So, how do you know which methods are parallelized and which aren't? And, of those which aren't, how do you know which ones can be nicely manually-threaded and which need multiprocessing?
There's no good answer to that. You can make educated guesses (X seems like it's probably implemented on top of ATLAS, and my copy of ATLAS is implicitly threaded), or you can read the source.
But usually, the best thing to do is try it and test. If the code is using 100% of one core and 0% of the others, add manual threading. If it's now using 100% of one core and 10% of the others and barely running faster, change the multithreading to multiprocessing. (Fortunately, Python makes this pretty easy, especially if you use the Executor classes from concurrent.futures
or the Pool classes from multiprocessing
. But you still often need to put some thought into it, and test the relative costs of sharing vs. passing if you have large arrays.)
Also, as kwatford points out, just because some method doesn't seem to be implicitly parallel doesn't mean it won't be parallel in the next version of numpy, or the next version of BLAS, or on a different platform, or even on a machine with slightly different stuff installed on it. So, be prepared to re-test. And do something like my_mean = numpy.mean
and then use my_mean
everywhere, so you can just change one line to my_mean = pool_threaded_mean
.