I think you want to use threads here rather than forking off new processes. While threads are bad in some cases, that isn't true here. Also, I think you want to use concurrent.futures
instead of using threads (or processes) directly.
For example, let's say you have 10 URLs, and you're currently doing them one in a row, like this:
results = map(tester, urls)
But now, you want to send them 2 at a time. Just change it to this:
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as pool:
results = pool.map(tester, urls)
If you want to try 4 at a time instead of 2, just change the max_workers
. In fact, you should probably experiment with different values to see what works best for your program.
If you want to do something a little fancier, see the documentation—the main ThreadPoolExecutor Example is almost exactly what you're looking for.
Unfortunately, in 2.7, this module doesn't come with the standard library, so you will have to install the backport from PyPI.
If you have pip
installed, this should be as simple as:
pip install futures
… or maybe sudo pip install futures
, on Unix.
And if you don't have pip
, go get it first (follow the link above).
The main reason you sometimes want to use processes instead of threads is that you've got heavy CPU-bound computation, and you want to take advantage of multiple CPU cores. In Python, threading can't effectively use up all your cores. So, if the Task Manager/Activity Monitor/whatever shows that your program is using up 100% CPU on one core, while the others are all at 0%, processes are the answer. With futures
, all you have to do is change ThreadPoolExecutor
to ProcessPoolExecutor
.
Meanwhile, sometimes you need more than just "give me a magic pool of workers to run my tasks". Sometimes you want to run a handful of very long jobs instead of a bunch of little ones, or load-balance the jobs yourself, or pass data between jobs, or whatever. For that, you want to use multiprocessing
or threading
instead of futures
.
Very rarely, even that is too high-level, and directly tell Python to create a new child process or thread. For that, you go all the way down to os.fork
(on Unix only) or thread
.