Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
322 views
in Technique[技术] by (71.8m points)

python - Avoiding np.where when assigning to numpy array

I would like for the following (or similar) to work (without using np.where)

>>> A = np.arange(0,10)
>>> ind = np.logical_and(A>4, A%2)
>>> k = np.array([0,1,0],dtype=bool)
>>> A[ind][k] = np.pi # Doesn't actually assign to A

That is, I want k to be an additional boolean mask on the values of ind that are true.

I know that I can use np.where(ind)[0][k], but this is more expensive than logical indexing.

Is there a way to reference A[ind] that will refer to the base memory of A?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From the oft-referenced numpy indexing page:

.... A single boolean index array is practically identical to x[obj.nonzero()] .... However, it is faster when obj.shape == x.shape.

np.where(cond) is np.nonzero(cond).

But let's do some simple timing

In [239]: x = np.arange(10000)
In [240]: y = (x%2).astype(bool)
In [241]: x[y].shape
Out[241]: (5000,)
In [242]: idx = np.nonzero(y)
In [243]: x[idx].shape
Out[243]: (5000,)
In [244]: timeit x[y].shape
89.9 μs ± 726 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [245]: timeit x[idx].shape
13.3 μs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [246]: timeit x[np.nonzero(y)].shape
34.2 μs ± 893 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So array indexing is faster than boolean indexing, even when we use an explicit where.


A[ind][k]= does not work because A[ind] is a copy, not a view.

In [251]: A = np.arange(100,110)
In [252]: ind = np.logical_and(A>104, A%2)
In [253]: ind
Out[253]: 
array([False, False, False, False, False,  True, False,  True, False,
        True])
In [254]: k = np.array([0,1,0], dtype=bool)
In [255]: A[ind]
Out[255]: array([105, 107, 109])
In [256]: A[ind][k]
Out[256]: array([107])
In [257]: A[ind][k] = 12
In [258]: A
Out[258]: array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

But using the k to select indices from np.where(ind) works:

In [262]: A[np.where(ind)[0][k]]=12
In [263]: A
Out[263]: array([100, 101, 102, 103, 104, 105, 106,  12, 108, 109])

Timings for a fetch rather than a set:

In [264]: timeit A[np.where(ind)[0][k]]
1.94 μs ± 75.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [265]: timeit A[ind][k]
1.34 μs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

So yes, the double masking is a bit faster in this case, but that doesn't matter if it doesn't work. Don't sweat the small time improvements.

A boolean indexing method

In [345]: ind1=ind.copy()
In [346]: ind1[ind] = k
In [348]: A[ind1]=3
In [349]: A
Out[349]: array([100, 101, 102, 103, 104, 105, 106,   3, 108, 109])

In this small example timeit is basically the same as for A[np.where(ind)[0][k]]=12.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...