If you use pool.map_async
you can pull this information out of the MapResult
instance that gets returned. For example:
import multiprocessing
import time
def worker(i):
time.sleep(i)
return i
if __name__ == "__main__":
pool = multiprocessing.Pool()
result = pool.map_async(worker, range(15))
while not result.ready():
print("num left: {}".format(result._number_left))
time.sleep(1)
real_result = result.get()
pool.close()
pool.join()
Output:
num left: 15
num left: 14
num left: 13
num left: 12
num left: 11
num left: 10
num left: 9
num left: 9
num left: 8
num left: 8
num left: 7
num left: 7
num left: 6
num left: 6
num left: 6
num left: 5
num left: 5
num left: 5
num left: 4
num left: 4
num left: 4
num left: 3
num left: 3
num left: 3
num left: 2
num left: 2
num left: 2
num left: 2
num left: 1
num left: 1
num left: 1
num left: 1
multiprocessing
internally breaks the iterable you pass to map
into chunks, and passes each chunk to the children processes. So, the _number_left
attribute really keeps track of the number of chunks remaining, not the individual elements in the iterable. Keep that in mind if you see odd looking numbers when you use large iterables. It uses chunking to improve IPC performance, but if seeing an accurate tally of completed results is more important to you than the added performance, you can use the chunksize=1
keyword argumment to map_async
to make _num_left
more accurate. (The chunksize
usually only makes a noticable performance difference for very large iterables. Try it for yourself to see if it really matters with your usecase).
As you mentioned in the comments, because pool.map
is blocking, you can't really get this unless you were to start a background thread that did the polling while the main thread blocked in the map
call, but I'm not sure there's any benefit to doing that over the above approach.
The other thing to keep in mind is that you're using an internal attribute of MapResult
, so it's possible that this could break in future versions of Python.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…