First off, there's been a lot of past discussion about this on the numpy list. For example, see:
http://numpy-discussion.10968.n7.nabble.com/poor-performance-of-sum-with-sub-machine-word-integer-types-td41.html
http://numpy-discussion.10968.n7.nabble.com/odd-performance-of-sum-td3332.html
Some of boils down to the fact that einsum
is new, and is presumably trying to be better about cache alignment and other memory access issues, while many of the older numpy functions focus on a easily portable implementation over a heavily optimized one. I'm just speculating, there, though.
However, some of what you're doing isn't quite an "apples-to-apples" comparison.
In addition to what @Jamie already said, sum
uses a more appropriate accumulator for arrays
For example, sum
is more careful about checking the type of the input and using an appropriate accumulator. For example, consider the following:
In [1]: x = 255 * np.ones(100, dtype=np.uint8)
In [2]: x
Out[2]:
array([255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255], dtype=uint8)
Note that the sum
is correct:
In [3]: x.sum()
Out[3]: 25500
While einsum
will give the wrong result:
In [4]: np.einsum('i->', x)
Out[4]: 156
But if we use a less limited dtype
, we'll still get the result you'd expect:
In [5]: y = 255 * np.ones(100)
In [6]: np.einsum('i->', y)
Out[6]: 25500.0