Look at the Python code for the scipy sparse product. Notice that it calls the compiled code in 2 passes.
It looks like the mkl code does the same thing
https://software.intel.com/en-us/node/468640
If request=1, the routine computes only values of the array ic of length m + 1, the memory for this array must be allocated beforehand. On exit the value ic(m+1) - 1 is the actual number of the elements in the arrays c and jc.
If request=2, the routine has been called previously with the parameter request=1, the output arrays jc and c are allocated in the calling program and they are of the length ic(m+1) - 1 at least.
You first allocated ic
based on the number of rows of C
(you know that from the inputs), and call the mkl code with request=1
.
For request=2
you have to allocate c
and jc
arrays, based on the size in ic(m+1) - 1
. This is not the same as the number of nnz
in the input arrays.
You are using request1 = c_int(0)
, which requires that the c
arrays be the correct size - which you don't know without actually performing the dot
(or a request 1
).
==================
File: /usr/lib/python3/dist-packages/scipy/sparse/compressed.py
Definition: A._mul_sparse_matrix(self, other)
pass 1 allocates indptr
(note size), and passes the pointers (data doesn't matter at this pass)
indptr = np.empty(major_axis + 1, dtype=idx_dtype)
fn = getattr(_sparsetools, self.format + '_matmat_pass1')
fn(M, N,
np.asarray(self.indptr, dtype=idx_dtype),
np.asarray(self.indices, dtype=idx_dtype),
np.asarray(other.indptr, dtype=idx_dtype),
np.asarray(other.indices, dtype=idx_dtype),
indptr)
nnz = indptr[-1]
pass 2 allocates indptr
(different size), and based on nnz
indices
and data
.
indptr = np.asarray(indptr, dtype=idx_dtype)
indices = np.empty(nnz, dtype=idx_dtype)
data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
fn = getattr(_sparsetools, self.format + '_matmat_pass2')
fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
np.asarray(self.indices, dtype=idx_dtype),
self.data,
np.asarray(other.indptr, dtype=idx_dtype),
np.asarray(other.indices, dtype=idx_dtype),
other.data,
indptr, indices, data)
Last make a new array using these arrays.
return self.__class__((data,indices,indptr),shape=(M,N))
The mkl
library should be used in the same way.
===================
https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h
has c code for csr_matmat_pass1
and csr_matmat_pass2
====================
In case it helps, here's a pure Python implementation of these passes. A literal translation without any attempt to take advantage of any array operations.
def pass1(A, B):
nrow,ncol=A.shape
Aptr=A.indptr
Bptr=B.indptr
Cp=np.zeros(nrow+1,int)
mask=np.zeros(ncol,int)-1
nnz=0
for i in range(nrow):
row_nnz=0
for jj in range(Aptr[i],Aptr[i+1]):
j=A.indices[jj]
for kk in range(Bptr[j],Bptr[j+1]):
k=B.indices[kk]
if mask[k]!=i:
mask[k]=i
row_nnz += 1
nnz += row_nnz
Cp[i+1]=nnz
return Cp
def pass2(A,B,Cnnz):
nrow,ncol=A.shape
Ap,Aj,Ax=A.indptr,A.indices,A.data
Bp,Bj,Bx=B.indptr,B.indices,B.data
next = np.zeros(ncol,int)-1
sums = np.zeros(ncol,A.dtype)
Cp=np.zeros(nrow+1,int)
Cj=np.zeros(Cnnz,int)
Cx=np.zeros(Cnnz,A.dtype)
nnz = 0
for i in range(nrow):
head = -2
length = 0
for jj in range(Ap[i], Ap[i+1]):
j, v = Aj[jj], Ax[jj]
for kk in range(Bp[j], Bp[j+1]):
k = Bj[kk]
sums[k] += v*Bx[kk]
if (next[k]==-1):
next[k], head=head, k
length += 1
print(i,sums, next)
for _ in range(length):
if sums[head] !=0:
Cj[nnz], Cx[nnz] = head, sums[head]
nnz += 1
temp = head
head = next[head]
next[temp], sums[temp] = -1, 0
Cp[i+1] = nnz
return Cp, Cj, Cx
def pass0(A,B):
Cp = pass1(A,B)
nnz = Cp[-1]
Cp,Cj,Cx=pass2(A,B,nnz)
N,M=A.shape[0], B.shape[1]
C=sparse.csr_matrix((Cx, Cj, Cp), shape=(N,M))
return C
Both A
and B
have to be csr
. It using a transpose it needs conversion, e.g. B = A.T.tocsr()
.