I am using the multiprocessor.Pool() module to speed up an "embarrassingly parallel" loop. I actually have a nested loop, and am using multiprocessor.Pool to speed up the inner loop. For example, without parallelizing the loop, my code would be as follows:
outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]
for i in outer_array:
for j in inner_array:
output[j][i]=full_func(j,i)
With parallelizing:
import multiprocessing
from functools import partial
outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]
for i in outer_array:
partial_func=partial(full_func,arg=i)
pool=multiprocessing.Pool()
output[:][i]=pool.map(partial_func,inner_array)
pool.close()
My main question is if this is the correct, and I should be including the multiprocessing.Pool() inside the loop, or if instead I should create the pool outside loop, i.e.:
pool=multiprocessing.Pool()
for i in outer_array:
partial_func=partial(full_func,arg=i)
output[:][i]=pool.map(partial_func,inner_array)
Also, I am not sure if I should include the line "pool.close()" at the end of each loop in the second example above; what would be the benefits of doing so?
Thanks!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…