Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

python, subprocess: launch new process when one (in a group) has terminated

I have n files to analyze separately and independently of each other with the same Python script analysis.py. In a wrapper script, wrapper.py, I loop over those files and call analysis.py as a separate process with subprocess.Popen:

for a_file in all_files:
    command = "python analysis.py %s" % a_file
    analysis_process = subprocess.Popen(
                                            shlex.split(command),
                                            stdout=subprocess.PIPE,
                                            stderr=subprocess.PIPE)
    analysis_process.wait()

Now, I would like to use all the k CPU cores of my machine in order to speed up the whole analysis. Is there a way to always have k-1 running processes as long as there are files to analyze?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This outlines how to use multiprocessing.Pool which exists exactly for these tasks:

from multiprocessing import Pool, cpu_count

# ...
all_files = ["file%d" % i for i in range(5)]


def process_file(file_name):
    # process file
    return "finished file %s" % file_name

pool = Pool(cpu_count())

# this is a blocking call - when it's done, all files have been processed
results = pool.map(process_file, all_files)

# no more tasks can go in the pool
pool.close()

# wait for all workers to complete their task (though we used a blocking call...)
pool.join()


# ['finished file file0', 'finished file file1',  ... , 'finished file file4']
print results

Adding Joel's comment mentioning a common pitfall:

Make sure that the function you pass to pool.map() contains only objects that are defined at the module level. Python multiprocessing uses pickle to pass objects between processes, and pickle has issues with things like functions defined in a nested scope.

The docs for what can be pickled


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...