I've got a large matrix stored as a scipy.sparse.csc_matrix and want to subtract a column vector from each one of the columns in the large matrix. This is a pretty common task when you're doing things like normalization/standardization, but I can't seem to find the proper way to do this efficiently.
Here's an example to demonstrate:
# mat is a 3x3 matrix
mat = scipy.sparse.csc_matrix([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
#vec is a 3x1 matrix (or a column vector)
vec = scipy.sparse.csc_matrix([1,2,3]).T
"""
I want to subtract `vec` from each of the columns in `mat` yielding...
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]
"""
One way to accomplish what I want is to hstack vec
to itself 3 times, yielding a 3x3 matrix where each column is vec
and then subtract that from mat
. But again, I'm looking for a way to do this efficiently, and the hstacked matrix takes a long time to create. I'm sure there's some magical way to do this with slicing and broadcasting, but it eludes me.
Thanks!
EDIT: Removed the 'in-place' constraint, because sparsity structure would be constantly changing in an in-place assignment scenario.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…