Since my np.dot
is accelerated by OpenBlas and Openmpi I am wondering if there was a possibility to write the double sum
for i in range(N):
for j in range(N):
B[k,l] += A[i,j,k,l] * X[i,j]
as an inner product. Right at the moment I am using
B = np.einsum("ijkl,ij->kl",A,X)
but unfortunately it is quite slow and only uses one processor.
Any ideas?
Edit:
I benchmarked the answers given until now with a simple example, seems like they are all in the same order of magnitude:
A = np.random.random([200,200,100,100])
X = np.random.random([200,200])
def B1():
return es("ijkl,ij->kl",A,X)
def B2():
return np.tensordot(A, X, [[0,1], [0, 1]])
def B3():
shp = A.shape
return np.dot(X.ravel(),A.reshape(shp[0]*shp[1],1)).reshape(shp[2],shp[3])
%timeit B1()
%timeit B2()
%timeit B3()
1 loops, best of 3: 300 ms per loop
10 loops, best of 3: 149 ms per loop
10 loops, best of 3: 150 ms per loop
Concluding from these results I would choose np.einsum, since its syntax is still the most readable and the improvement with the other two are only a factor 2x. I guess the next step would be to externalize the code into C or fortran.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…