res=np.zeros((3,4,4))
res[:] = np.kron(a,b)
Should work, broadcasting the kron
to all 3 planes.
kron
is specialized rearrangement of the outer product of a
and b
. A (2,2,2,2) rearranged to (4,4). I worked out the details in another post:
Why is numpy's kron so fast?
Your (3,4,4) could be obtained from a (3,2,2,2,2), but it's not standard, so there isn't an out-of-the-box function. You could try adapting my answer.
In [246]: a = np.array([[1,2],
...: [3,4]])
...: b = np.array([[5,6],
...: [7,8]])
In [249]: np.kron(a,b)
Out[249]:
array([[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]])
As I showed before, kron
can be produced by applying a transpose and shape to an outer
product. We can use einsum
for both the outer and transpose:
In [253]: np.einsum('ij,kl->ikjl',a,b) # ikjl instead ijkl
Out[253]:
array([[[[ 5, 6],
[10, 12]],
[[ 7, 8],
[14, 16]]],
[[[15, 18],
[20, 24]],
[[21, 24],
[28, 32]]]])
In [254]: np.einsum('ij,kl->ikjl',a,b).reshape(4,4)
Out[254]:
array([[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]])
Generalizing that to arrays (3,2,2) shape, we can add an extra 'batch' dimension:
In [255]: c = np.stack([a,a,a])
...: d = np.stack([b,b,b])
In [256]: c
Out[256]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
In [257]: c.shape
Out[257]: (3, 2, 2)
In [258]: np.einsum('aij,akl->aikjl',c,d).reshape(3,4,4)
Out[258]:
array([[[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]],
[[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]],
[[ 5, 6, 10, 12],
[ 7, 8, 14, 16],
[15, 18, 20, 24],
[21, 24, 28, 32]]])
But if we know that c
and d
are just replications of a
and b
, the broadcasted solution is fastester
In [260]: res = np.zeros((3,4,4),int)
In [261]: res[:] = np.kron(a,b)
or even better (a view of the kron witout copies):
np.broadcast_to(np.kron(a,b),(3,4,4))
Some timings:
In [280]: timeit np.einsum('aij,akl->aikjl',c,d).reshape(3,4,4)
10.2 μs ± 22.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [281]: timeit res=np.zeros((3,4,4),int);res[:] = np.kron(a,b)
47.5 μs ± 1.66 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [282]: timeit np.broadcast_to(np.kron(a,b),(3,4,4))
57.6 μs ± 1.76 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [283]: timeit np.array(list(np.kron(x, y) for x, y in zip(c, d)))
143 μs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I'm bit surprised how much faster the einsum
is. Also that braodcast_to
isn't faster, though that's largely the fault of kron
(which my previous answer showed was on the slow side).