I am using multiprocessing.Pool()
to parallelize some heavy computations.
The target function returns a lot of data (a huge list). I'm running out of RAM.
Without multiprocessing
, I'd just change the target function into a generator, by yield
ing the resulting elements one after another, as they are computed.
I understand multiprocessing does not support generators -- it waits for the entire output and returns it at once, right? No yielding. Is there a way to make the Pool
workers yield data as soon as they become available, without constructing the entire result array in RAM?
Simple example:
def target_fnc(arg):
result = []
for i in xrange(1000000):
result.append('dvsdbdfbngd') # <== would like to just use yield!
return result
def process_args(some_args):
pool = Pool(16)
for result in pool.imap_unordered(target_fnc, some_args):
for element in result:
yield element
This is Python 2.7.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…