When starting a new python-process multiprocessing
-module uses spawn
-method on Windows (this behavior can be also triggered on Linux by using mp.set_start_method('spawn')
.
Command-line arguments are passed to the interpreter in the new process, so the communication with the parent process can be established, for example:
python -c "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=11)" --multiprocessing-fork
The problem with embeded cython modules (or with frozen (i.e. created with cx_Freeze, py2exe and similar) modules in general), that passing command line arguments to them corresponds more to
python my_script.py <arguments>
i.e. the command line aren't automatically processed by the interpeter, but needs to be handled in the script.
multiprocessing
provides a function called multiprocessing.freeze_support()
, which handles the command line arguments correctly and which can be used as shown in Bastian's answer:
if __name__ == '__main__':
# needed for Cython, as it doesn't set `frozen`-attribute
setattr(sys, 'frozen', True)
# parse command line options and execute it if needed
multiprocessing.freeze_support()
This solution works however only for Windows, as can be seen in the code:
def freeze_support(self):
'''Check whether this is a fake forked process in a frozen executable.
If so then run code specified by commandline and exit.
'''
if sys.platform == 'win32' and getattr(sys, 'frozen', False):
from .spawn import freeze_support
freeze_support()
There is a bug-report: multiprocessing freeze_support needed outside win32 which might/might not be fixed soon.
As explained in the above bug-report, it is not enough to set frozen
attribute to True
and to call freeze_support
directly from the multiprocessing.spawn
because than the semaphore tracker isn't handled correctly.
There are two options I see: either to patch your installation with a yet unreleased patch from the above bug report or to use the do-it-yourself approach presented bellow.
Here are an earlier version of this answer which is more "experimental" but offers more insights/details and proposes a solution in a somewhat Do-It-Yourself-style.
I'm on linux, so I use mp.set_start_method('spawn')
to simulate the behavior of windows.
What happens in the spawn
-mode? Let's add some sleep
s, so we can investigate the processes:
#bomb.py
import multiprocessing as mp
import sys
import time
def worker():
time.sleep(50)
print('Worker')
return
if __name__ == '__main__':
print("Starting...")
time.sleep(20)
mp.set_start_method('spawn') ## use spawn!
jobs = []
for i in range(5):
p = mp.Process(target=worker)
jobs.append(p)
p.start()
By using pgrep python
we can see that at first there is only one-python process, then 7(!) different pid
s. We can see the command-line arguments via cat /proc/<pid>/cmdline
. 5 of the new processes have command line
-c "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=11)" --multiprocessing-fork
and one:
-c "from multiprocessing.semaphore_tracker import main;main(4)"
That means, the parent process starts 6 new python interpreter instances and every newly started interpreter executes a code sent from the parent via the command line options, the information is shared via pipes. One of these 6 python-instances is a tracker, which observes the whole thing.
Ok, what happens if cythonized+embeded? The same as with the normal python, the only difference is that the bomb
-executable is started instead of python. But differently as the python-interpreter, it doesn't execute/isn't aware of the command line arguments, so the main
function runs over and over and over again.
There is an easy fix: let the bomb
-exe to start the python interpreter
...
if __name__ == '__main__':
mp.set_executable(<PATH TO PYTHON>)
....
Now the bomb
is no longer a multiprocessing bomb!
However, the goal is probably not to have a python-interpreter around, so we need to make our program aware of possible command lines:
import re
......
if __name__ == '__main__':
if len(sys.argv)==3: # should start in semaphore_tracker mode
nr=list(map(int, re.findall(r'd+',sys.argv[2])))
sys.argv[1]='--multiprocessing-fork' # this canary is needed for multiprocessing module to work
from multiprocessing.semaphore_tracker import main;main(nr[0])
elif len(sys.argv)>3: # should start in slave mode
fd, pipe=map(int, re.findall(r'd+',sys.argv[2]))
print("I'm a slave!, fd=%d, pipe=%d"%(fd,pipe))
sys.argv[1]='--multiprocessing-fork' # this canary is needed for multiprocessing module to work
from multiprocessing.spawn import spawn_main;
spawn_main(tracker_fd=fd, pipe_handle=pipe)
else: #main mode
print("Starting...")
mp.set_start_method('spawn')
jobs = []
for i in range(5):
p = mp.Process(target=worker)
jobs.append(p)
p.start()
Now, our bomb doesn't need a stand-alone python-interpreter and stops after the workers are done. Please note the following:
- The way it is decide, in which mode
bomb
should be started is not very error-safe, but I hope you get the gist
--multiprocessing-fork
is just a canary, it doesn't do anything it only must be there, see here.
NB: The changed code can be also used with python, because after executing "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=11)" --multiprocessing-fork
python changes the sys.argv
so the code no longer sees the original command line and len(sys.argv)
is 1
.