Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
342 views
in Technique[技术] by (71.8m points)

python - Printed output not displayed when using joblib in jupyter notebook

So I am using joblib to parallelize some code and I noticed that I couldn't print things when using it inside a jupyter notebook.

I tried using doing the same example in ipython and it worked perfectly.

Here is a minimal (not) working example to write in a jupyter notebook cell

from joblib import Parallel, delayed
Parallel(n_jobs=8)(delayed(print)(i) for i in range(10))

So I am getting the output as [None, None, None, None, None, None, None, None, None, None] but nothing is printed.

What I expect to see (print order could be random in reality):

1
2
3
4
5
6
7
8
9
10
[None, None, None, None, None, None, None, None, None, None]

Note:

You can see the prints in the logs of the notebook process. But I would like the prints to happen in the notebook, not the logs of the notebook process.

EDIT

I have opened a Github issue, but with minimal attention so far.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think this caused in part by the way Parallel spawns the child workers, and how Jupyter Notebook handles IO for those workers. When started without specifying a value for backend, Parallel will default to loky which utilizes a pooling strategy that directly uses a fork-exec model to create the subprocesses.

If you start Notebook from a terminal using

$ jupyter-notebook

the regular stderr and stdout streams appear to remain attached to that terminal, while the notebook session will start in a new browser window. Running the posted code snippet in the notebook does produce the expected output, but it seems to go to stdout and ends up in the terminal (as hinted in the Note in the question). This further supports the suspicion that this behavior is caused by the interaction between loky and notebook, and the way the standard IO streams are handled by notebook for child processes.

This lead me to this discussion on github (active within the past 2 weeks as of this posting) where the authors of notebook appear to be aware of this, but it would seem that there is no obvious and quick fix for the issue at the moment.

If you don't mind switching the backend that Parallel uses to spawn children, you can do so like this:

from joblib import Parallel, delayed
Parallel(n_jobs=8, backend='multiprocessing')(delayed(print)(i) for i in range(10))

with the multiprocessing backend, things work as expected. threading looks to work fine too. This may not be the solution you were hoping for, but hopefully it is sufficient while the notebook authors work on finding a proper solution.

I'll cross-post this to GitHub in case anyone there cares to add to this answer (I don't want to misstate anyone's intent or put words in people mouths!).


Test Environment:
MacOS - Mojave (10.14)
Python - 3.7.3
pip3 - 19.3.1

Tested in 2 configurations. Confirmed to produce the expected output when using both multiprocessing and threading for the backend parameter. Packages install using pip3.

Setup 1:

ipykernel                               5.1.1
ipython                                 7.5.0
jupyter                                 1.0.0
jupyter-client                          5.2.4
jupyter-console                         6.0.0
jupyter-core                            4.4.0
notebook                                5.7.8

Setup 2:

ipykernel                               5.1.4
ipython                                 7.12.0
jupyter                                 1.0.0
jupyter-client                          5.3.4
jupyter-console                         6.1.0
jupyter-core                            4.6.2
notebook                                6.0.3

I also was successful using the same versions as 'Setup 2' but with the notebook package version downgraded to 6.0.2.

Note:

This approach works inconsistently on Windows. Different combinations of software versions yield different results. Doing the most intuitive thing-- upgrading everything to the latest version-- does not guarantee it will work.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...