This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:
In [1]: a = np.arange(15).reshape([3,5])
...: b = np.arange(30, step=2).reshape([3,5])
...:
In [2]: def f(x,y):
...: return np.dot(x,y)
zipped comprehension
The list comprehension approach applies f
to the 3 rows of a
and b
. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot
can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds
In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([ 60, 510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60
vectorize/frompyfunc
np.vectorize
iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc
returns a object dtype array (and is used by vectorize
):
In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]:
array([[0, 2, 8, 18, 32],
[50, 72, 98, 128, 162],
[200, 242, 288, 338, 392]], dtype=object)
So the result is (3,5) array; incidentally summing across columns gets the desired result
In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)
np.vectorize
does not make any speed promises.
apply_along_axis
I don't know how you tried to use apply_along_axis
. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a
):
for i in range(3):
idx = (i, slice(None))
outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))
For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.
(apply_along_axis
takes arr
and *args
. It iterates on arr
, but uses *args
whole.).
indexing
np.dot(a[np.arange(3)], b[np.arange(3)])
is the same as
np.dot(a, b)
dot
is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).
iteration
For a truly generic f(x,y)
, your only alternative to the zipped list comprehension is an index loop like this:
In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
...: c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([ 60., 510., 1460.])
Speed will be similar. (that action can be moved to compiled code with cython
, but I don't think you are ready to dive in that deep.)
As noted in a comment, if the arrays are (N,M)
, and N
is small compared to M
, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.
best
The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.
In the matrix multiplication case, einsum
has implemented a generalized form of 'sum-of-products' in compiled code:
In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([ 60, 510, 1460])
matmul
also generalizes the product, but works best with 3d arrays:
In [25]: a[:,None,:]@b[:,:,None] # needs reshape
Out[25]:
array([[[ 60]],
[[ 510]],
[[1460]]])