I have an array of values arr
with shape (N,) and an array of coordinates coords
with shape (N,2). I want to represent this in an (M,M) array grid
such that grid
takes the value 0 at coordinates that are not in coords
, and for the coordinates that are included it should store the sum of all values in arr
that have that coordinate. So if M=3, arr = np.arange(4)+1
, and coords = np.array([[0,0,1,2],[0,0,2,2]])
then grid
should be:
array([[3., 0., 0.],
[0., 0., 3.],
[0., 0., 4.]])
The reason this is nontrivial is that I need to be able to repeat this step many times and the values in arr
change each time, and so can the coordinates. Ideally I am looking for a vectorized solution. I suspect that I might be able to use np.where
somehow but it's not immediately obvious how.
Timing the solutions
I have timed the solutions present at this time and it appear that the accumulator method is slightly faster than the sparse matrix method, with the second accumulation method being the slowest for the reasons explained in the comments:
%timeit for x in range(100): accumulate_arr(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): accumulate_arr_v2(np.random.randint(100,size=(2,10000)),np.random.normal(0,1,10000))
%timeit for x in range(100): sparse.coo_matrix((np.random.normal(0,1,10000),np.random.randint(100,size=(2,10000))),(100,100)).A
47.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
103 ms ± 255 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
48.2 ms ± 36 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
See Question&Answers more detail:
os