In short: I cannot reproduce your problem. If you are on Windows you should use a protector for your main loop: documentation of joblib.Parallel
. The only problem I see is much data copying overhead, but your numbers seem unrealistic to be caused by that.
In long, here are my timings with your code:
On my i7 3770k (4 cores, 8 threads) I get the following results for different n_jobs
:
For-loop: Finished in 33.8521318436 sec
n_jobs=1: Finished in 33.5527760983 sec
n_jobs=2: Finished in 18.9543449879 sec
n_jobs=3: Finished in 13.4856410027 sec
n_jobs=4: Finished in 15.0832719803 sec
n_jobs=5: Finished in 14.7227740288 sec
n_jobs=6: Finished in 15.6106669903 sec
So there is a gain in using multiple processes. However although I have four cores the gain already saturates at three processes. So I guess the execution time is actually limited by memory access rather than processor time.
You should notice that the arguments for each single loop entry are copied to the process executing it. This means you copy a
for each element in b
. That is ineffective. So instead access the global a
. (Parallel
will fork the process, copying all global variables to the newly spawned processes, so a
is accessible). This gives me the following code (with timing and main loop guard as the documentation of joblib
recommends:
import numpy as np
from matplotlib.path import Path
from joblib import Parallel, delayed
import time
import sys
## Check if one line segment contains another.
def check_paths(path):
for other_path in a:
res='no cross'
chck = Path(other_path)
if chck.contains_path(path)==1:
res= 'cross'
break
return res
if __name__ == '__main__':
## Create pairs of points for line segments
a = zip(np.random.rand(5000,2),np.random.rand(5000,2))
b = zip(np.random.rand(300,2),np.random.rand(300,2))
now = time.time()
if len(sys.argv) >= 2:
res = Parallel(n_jobs=int(sys.argv[1])) (delayed(check_paths) (Path(points)) for points in b)
else:
res = [check_paths(Path(points)) for points in b]
print "Finished in", time.time()-now , "sec"
Timing results:
n_jobs=1: Finished in 34.2845709324 sec
n_jobs=2: Finished in 16.6254048347 sec
n_jobs=3: Finished in 11.219119072 sec
n_jobs=4: Finished in 8.61683392525 sec
n_jobs=5: Finished in 8.51907801628 sec
n_jobs=6: Finished in 8.21842098236 sec
n_jobs=7: Finished in 8.21816396713 sec
n_jobs=8: Finished in 7.81841087341 sec
The saturation now slightly moved to n_jobs=4
which is the value to be expected.
check_paths
does several redundant calculations that can easily be eliminated. Firstly for all elements in other_paths=a
the line Path(...)
is executed in every call. Precalculate that. Secondly the string res='no cross'
is written is each loop turn, although it may only change once (followed by a break and return). Move the line in front of the loop. Then the code looks like this:
import numpy as np
from matplotlib.path import Path
from joblib import Parallel, delayed
import time
import sys
## Check if one line segment contains another.
def check_paths(path):
#global a
#print(path, a[:10])
res='no cross'
for other_path in a:
if other_path.contains_path(path)==1:
res= 'cross'
break
return res
if __name__ == '__main__':
## Create pairs of points for line segments
a = zip(np.random.rand(5000,2),np.random.rand(5000,2))
a = [Path(x) for x in a]
b = zip(np.random.rand(300,2),np.random.rand(300,2))
now = time.time()
if len(sys.argv) >= 2:
res = Parallel(n_jobs=int(sys.argv[1])) (delayed(check_paths) (Path(points)) for points in b)
else:
res = [check_paths(Path(points)) for points in b]
print "Finished in", time.time()-now , "sec"
with timings:
n_jobs=1: Finished in 5.33742594719 sec
n_jobs=2: Finished in 2.70858597755 sec
n_jobs=3: Finished in 1.80810618401 sec
n_jobs=4: Finished in 1.40814709663 sec
n_jobs=5: Finished in 1.50854086876 sec
n_jobs=6: Finished in 1.50901818275 sec
n_jobs=7: Finished in 1.51030707359 sec
n_jobs=8: Finished in 1.51062297821 sec
A side node on your code, although I haven't really followed its purpose as this was unrelated to your question, contains_path
will only return True
if this path completely contains the given path.
(see documentation). Therefore your function will basically always return no cross
given the random input.