I am trying to perform a large linear-algebra computation to transform a generic covariance matrix KK_l_obs
(shape (NL, NL)
)into a map of covariance matrices in a reduced space Kmap_PC
(shape (q, q, X, Y)
).
Information about how to construct Kmap_PC
for each spatial location is held in other arrays a
, I0
, and k_l_th
. The first two have shapes (X, Y)
, and the third (nl, nl)
. The transformation between the observed and reduced space is handed by eingenvectors E
(shape (q, nl)
). Note that NL
> nl
.
A spatial element of Kmap_PC
is calculated as:
Kmap_PC[..., X, Y] = E.dot(
KK_l_obs[I0[X, Y]: I0[X, Y] + nl,
I0[X, Y]: I0[X, Y] + nl] / a_map[X, Y] +
k_l_th).dot(E.T)
The bit inside the first dot product could theoretically be computed straight using np.einsum
, but would take up hundreds of GB of memory. What I am doing now is looping through the spatial indices of Kmap_PC
, which is pretty slow. I could also distribute the calculation using MPI (which could probably give a speedup of 3-4x, since I have 16 cores available).
I'm wondering:
(a) if I can do the computation more efficiently--perhaps explicitly breaking it down into groups of spatial elements; and
(b) if I can improve the memory overhead for those calculations.
Code snippet
import numpy as np
np.random.seed(1)
X = 10
Y = 10
NL = 3000
nl = 1000
q = 7
a_map = 5. * np.random.rand(X, Y)
E = np.random.randn(q, nl)
# construct constant component
m1_ = .05 * np.random.rand(nl, nl)
k_l_th = m1_.dot(m1_)
# construct variable component
m2_ = np.random.rand(NL, NL)
KK_l_obs = m2_.dot(m2_.T)
# where to start in big cov
I0 = np.random.randint(0, NL - nl, (X, Y))
# the slow way
def looping():
K_PC = np.empty((q, q, X, Y))
inds = np.ndindex((X, Y))
for si in inds:
I0_ = I0[si[0], si[1]]
K_PC[..., si[0], si[1]] = E.dot(
KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl] / a_map[si[0], si[1]] + k_l_th).dot(E.T)
return K_PC
def veccalc():
nl_ = np.arange(nl)[..., None, None]
I, J = np.meshgrid(nl_, nl_)
K_s = KK_l_obs[I0[..., None, None] + J, I0[..., None, None] + I]
K_s = K_s / a_map[..., None, None] + k_l_th[None, None, ...]
print(K_s.nbytes)
K_PC = E @ K_s @ E.T
K_PC = np.moveaxis(K_PC, [0, 1], [-2, -1])
return K_PC
See Question&Answers more detail:
os