Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

python - How to spawn parallel child processes on a multi-processor system?

I have a Python script that I want to use as a controller to another Python script. I have a server with 64 processors, so want to spawn up to 64 child processes of this second Python script. The child script is called:

$ python create_graphs.py --name=NAME

where NAME is something like XYZ, ABC, NYU etc.

In my parent controller script I retrieve the name variable from a list:

my_list = [ 'XYZ', 'ABC', 'NYU' ]

So my question is, what is the best way to spawn off these processes as children? I want to limit the number of children to 64 at a time, so need to track the status (if the child process has finished or not) so I can efficiently keep the whole generation running.

I looked into using the subprocess package, but rejected it because it only spawns one child at a time. I finally found the multiprocessor package, but I admit to being overwhelmed by the whole threads vs. subprocesses documentation.

Right now, my script uses subprocess.call to only spawn one child at a time and looks like this:

#!/path/to/python
import subprocess, multiprocessing, Queue
from multiprocessing import Process

my_list = [ 'XYZ', 'ABC', 'NYU' ]

if __name__ == '__main__':
    processors = multiprocessing.cpu_count()

    for i in range(len(my_list)):
        if( i < processors ):
             cmd = ["python", "/path/to/create_graphs.py", "--name="+ my_list[i]]
             child = subprocess.call( cmd, shell=False )

I really want it to spawn up 64 children at a time. In other stackoverflow questions I saw people using Queue, but it seems like that creates a performance hit?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you are looking for is the process pool class in multiprocessing.

import multiprocessing
import subprocess

def work(cmd):
    return subprocess.call(cmd, shell=False)

if __name__ == '__main__':
    count = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(processes=count)
    print pool.map(work, ['ls'] * count)

And here is a calculation example to make it easier to understand. The following will divide 10000 tasks on N processes where N is the cpu count. Note that I'm passing None as the number of processes. This will cause the Pool class to use cpu_count for the number of processes (reference)

import multiprocessing
import subprocess

def calculate(value):
    return value * 10

if __name__ == '__main__':
    pool = multiprocessing.Pool(None)
    tasks = range(10000)
    results = []
    r = pool.map_async(calculate, tasks, callback=results.append)
    r.wait() # Wait on the results
    print results

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...