Quite a bit to unpack here, but it basically all boils down to how python spins up a new process, and executes the function you want.
On *nix systems, the default way to create a new process is by using fork. This is great because it uses "copy-on-write" to give the new child process access to a copy of the parent's working memory. It is fast and efficient, but it comes with a significant drawback if you're using multithreading at the same time. Not everything actually gets copied, and some things can get copied in an invalid state (threads, mutexes, file handles etc). This can cause quite a number of problems if not handled correctly, and to get around those python can use spawn instead (also Windows doesn't have "fork" and must use "spawn").
Spawn basically starts a new interpreter from scratch, and does not copy the parent's memory in any way. Some mechanism must be used to give the child access to functions and data defined before it was created however, and python does this by having that new process basically import *
from the ".py" file it was created from. This is problematic with interactive mode because there isn't really a ".py" file to import
, and is the primary source of "multiprocessing doesn't like interactive" problems. Putting your mp code into a library which you then import and execute does work in interactive, because it can be imported from a ".py" file. This is also why we use the if __name__ == "__main__":
line to separate any code you don't want to be re-executed in the child when the import occurs. If you were to spawn a new process without this, it could recursively keep spawning children (though there's technically a built-in guard for that specific case iirc).
Then with either start method, the parent communicates with the child over a pipe (using pickle
to exchange python objects) telling it what function to call, and what the arguments are. This is why arguments must be picklable. Some things can't be pickled, which is another common source of errors in multiprocessing
.
Finally on another note, the IPython interpreter (the default Spyder shell) doesn't always collect stdout or stderr from child processes when using "spawn", meaning print
statements won't be shown. The vanilla (python.exe) interpreter handles this better.
In your specific case:
- Jupyter lab is running in interactive mode, and the child process will have been created but gotten an error something like "can't import get_content from __main__". The error doesn't get displayed correctly because it didn't happen in the main process, and jupyter doesn't handle stderr from the child correctly
- Spyder is using IPython, by default which is not relaying the
print
statements from the child to the parent. Here you can switch to the "external system console" in the "run" dialog, but you then must also do something to keep the window open long enough to read the output (prevent the process from exiting).
- Google Colab is using a google server running Linux to execute your code rather than executing it locally on your windows machine, so by using "fork" as the start method, the particular issue of not having a ".py" file to
import
from is not an issue.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…