How about broadcasted exponentiation?
%timeit (x ** np.arange(N)[:, None]).T
43 μs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sanity check -
np.all((x ** np.arange(N)[:, None]).T == np.vander(x, N, increasing=True))
True
The caveat here is that this speedup is possible only if your input array x
has a dtype
of int
. As @Warren Weckesser pointed out in a comment, the broadcasted exponentiation slows down for floating point arrays.
As for why np.vander
is slow, take a look at the source code -
x = asarray(x)
if x.ndim != 1:
raise ValueError("x must be a one-dimensional array or sequence.")
if N is None:
N = len(x)
v = empty((len(x), N), dtype=promote_types(x.dtype, int))
tmp = v[:, ::-1] if not increasing else v
if N > 0:
tmp[:, 0] = 1
if N > 1:
tmp[:, 1:] = x[:, None]
multiply.accumulate(tmp[:, 1:], out=tmp[:, 1:], axis=1)
return v
The function has to cater to a lot more use cases besides yours, so it uses a more generalized method of computation which is reliable, but slower (I'm specifically pointing to multiply.accumulate
).
As a matter of interest, I found another way of computing the Vandermonde matrix, ending up with this:
%timeit x[:, None] ** np.arange(N)
150 μs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
It does the same thing, but is so much slower. The answer lies in the fact that the operations are broadcast, but inefficiently.
On the flip side, for float
arrays, this actually ends up performing the best.