I recently started to use numpy memmap to link an array in my project since I have a 3 dimensions tensor for a total of 133 billions values for a graph of the dataset I am using as example.
I am trying to calculate the heat kernel signature of a 5748 nodes graph (21st of DD dataset). My code to calculate the projectors (where I use memmap) is:
Path('D:/hks_temp').mkdir(parents=True, exist_ok=True)
for l, ll in enumerate(L):
pl = np.zeros((n, n))
for k in ll:
pl += np.outer(evecs[:, k], evecs[:, k])
fp = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='w+', shape=(n, n))
fp[:] = pl[:]
fp.flush()
inside all the X_hks.npy
there is a n by n ndarray (from the example 5748 * 5748).
Then I want all these computed arrays to form the 3 dimension tensor so I "link" (I don't know if it's the right term) them in this way:
P = np.array([None] * len(L)) # len(L) = 4043
for l in range(len(L)):
P[l] = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='r', shape=(n, n))
P
is used later only to do inside a cycle H = np.einsum('ijk,i->jk', P, np.exp(-unique_eval * t))
.
However, that raises an error: ValueError: einstein sum subscripts string contains too many subscripts for operand 0
. Since the method is correct for smaller graphs that doesn't require memmap, my thought was that P isn't well structured for numpy and I must arrange the data, maybe doing a reshape. So I tried to do a P.reshape(len(L), n, n)
but it doesn't work giving ValueError: cannot reshape array of size 4043 into shape (4043,5748,5748)
. How can I make it work?
I already found this question but it doesn't fit this case. I think I can't store all inside one big object since it did 497GB of memmap files (126MB each). If I can do it, please tell me.
If it is impossible to do it I will reduce the use case, however I am quite interested to make it work for all the possibilities.
question from:
https://stackoverflow.com/questions/66050844/how-to-use-a-ndarray-of-stored-ndarrays-with-memmap-as-a-big-ndarray-tensor