Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

python - Why isn't broadcasting with numpy faster than a nested loop

I have a calculation in my code that get carried out thousands of times and I wanted to see if I could make it faster as it is currently using two nested loops. I assumed that if I used broadcasting I could make it several times faster.

I've shown the two options below, which thankfully give the same results.

import numpy as np

n = 1000
x = np.random.random([n, 3])
y = np.random.random([n, 3])
func_weight = np.random.random(n)


result = np.zeros([n, 9])
result_2 = np.zeros([n, 9])

# existing
for a in range(3):
    for b in range(3):
        result[:, 3*a + b] = x[:, a] * y[:, b] * func_weight

# broadcasting - assumed this would be faster
for a in range(3):
    result_2[:, 3*a:3*(a+1)] = np.expand_dims(x[:, a], axis=-1) * y * np.expand_dims(func_weight, axis=-1)

Timings

n=100
nested loops: 24.7 μs ± 362 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
broadcasting: 70.3 μs ± 1.22 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

n=1000
nested loops: 50.5 μs ± 913 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
broadcasting: 148 μs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

n=10000
nested loops: 327 μs ± 7.99 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
broadcasting: 864 μs ± 5.57 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In my testing, broadcasting is always slower, so I'm a little confused as to what is happening. I'm guessing that because I had to use expand_dims to get the shapes aligned in the second solution, that is what the big impact on performance is. Is that correct? As the array size grows, there's not much change in performance with the nested loop always about 3 times quicker.

Is there a more optimal third solution that I haven't considered?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
In [126]: %%timeit
     ...: result = np.zeros([n,9])
     ...: for a in range(3):
     ...:     for b in range(3):
     ...:         result[:, 3*a + b] = x[:, a] * y[:, b] * func_weight
141 μs ± 255 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [128]: %%timeit
     ...: result_2 = np.zeros([n,9])
     ...: for a in range(3):
     ...:    result_2[:, 3*a:3*(a+1)] = np.expand_dims(x[:, a], axis=-1) * y * n
     ...: p.expand_dims(func_weight, axis=-1)
202 μs ± 10.8 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A fully broadcasted version:

In [130]: %%timeit
     ...: result_3 = (x[:,:,None]*y[:,None,:]*func_weight[:,None,None]).reshape(
     ...: n,9) 
88.8 μs ± 73.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Replacing the expand_dims with np.newaxis/None expansion:

In [131]: %%timeit
     ...: result_2 = np.zeros([n,9])
     ...: for a in range(3):
     ...:    result_2[:, 3*a:3*(a+1)] = x[:, a,None] * y * func_weight[:,None]
132 μs ± 315 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So yes, expand_dims is a bit slow, I think because it tries to be general purpose. And an extra layer of function calls.

expand_dims is just a.reshape(shape), but it takes a bit of time to translate your axis parameter into the shape tuple. As an experienced user I find that the None syntax is clearer (and faster) - visually it stands out as a dimension-adding action.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...