I found this very intriguing and decided to time this myself. But instead of just checking for 10x10 arrays I tested a lot of different array sizes with NumPy 1.16.2:
This clearly shows that for small array sizes the normal addition is faster and only for moderately large array sizes the in-place operation is faster. There is also a weird bump around 100000 elements that I cannot explain (it's close to the page size on my computer, maybe there a different allocation scheme is used).
Allocating a temporary array is expected to be slower because:
- One has to allocate that memory
- One has to iterate over 3 arrays do perform the operation instead of 2.
Especially the first point (allocating the memory) is probably not accounted for in the benchmark (not with %timeit
not with the simple_benchmark.run
). That's because requesting the same memory-size over and over again will be something that is probably very optimized. Which would make the addition with an extra array seem a bit faster than it actually is.
Another point to mention here is that in-place addition probably has a higher constant factor. If you're doing an in-place addition you have do to more code-checks before you can perform the operation, for example for overlapping inputs. That could give in-place addition a higher constant factor.
As a more general advise: Micro-benchmarks can be helpful but they are not always really accurate. You should also benchmark the code that calls it to make more educated statements about the actual performance of your code. Often such micro-benchmarks hit some highly optimized cases (for example repeatedly allocating the same amount of memory and releasing it again), that wouldn't happen (so often) when the code is actually used.
Here is also the code I used for the graph, using my library simple_benchmark
:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
b = BenchmarkBuilder()
@b.add_function()
def func1(a1, a2):
a1 = a1 + a2
@b.add_function()
def func2(a1, a2):
a1 += a2
@b.add_arguments('array size')
def argument_provider():
for exp in range(3, 28):
dim_size = int(1.4**exp)
a1 = np.random.random([dim_size, dim_size])
a2 = np.random.random([dim_size, dim_size])
yield dim_size ** 2, MultiArgument([a1, a2])
r = b.run()
r.plot()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…