python - When to call .join() on a process?

Question

Welcome To Ask or Share your Answers For Others

python - When to call .join() on a process?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - When to call .join() on a process?

I am reading various tutorials on the multiprocessing module in Python, and am having trouble understanding why/when to call process.join(). For example, I stumbled across this example:

nums = range(100000)
nprocs = 4

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outdict = {}
    for n in nums:
        outdict[n] = factorize_naive(n)
    out_q.put(outdict)

# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []

for i in range(nprocs):
    p = multiprocessing.Process(
            target=worker,
            args=(nums[chunksize * i:chunksize * (i + 1)],
                  out_q))
    procs.append(p)
    p.start()

# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
    resultdict.update(out_q.get())

# Wait for all worker processes to finish
for p in procs:
    p.join()

print resultdict

From what I understand, process.join() will block the calling process until the process whose join method was called has completed execution. I also believe that the child processes which have been started in the above code example complete execution upon completing the target function, that is, after they have pushed their results to the out_q. Lastly, I believe that out_q.get() blocks the calling process until there are results to be pulled. Thus, if you consider the code:

resultdict = {}
for i in range(nprocs):
    resultdict.update(out_q.get())

# Wait for all worker processes to finish
for p in procs:
    p.join()

the main process is blocked by the out_q.get() calls until every single worker process has finished pushing its results to the queue. Thus, by the time the main process exits the for loop, each child process should have completed execution, correct?

If that is the case, is there any reason for calling the p.join() methods at this point? Haven't all worker processes already finished, so how does that cause the main process to "wait for all worker processes to finish?" I ask mainly because I have seen this in multiple different examples, and I am curious if I have failed to understand something.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:45:58+0000

At the point just before you call join, all workers have put their results into their queues, but they did not necessarily return, and their processes may not yet have terminated. They may or may not have done so, depending on timing.

Calling join makes sure that all processes are given the time to properly terminate.

Categories

python - When to call .join() on a process?

python - When to call .join() on a process?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags