python - what is the most efficient way to find the position of the first np.nan value?

Question

Welcome To Ask or Share your Answers For Others

python - what is the most efficient way to find the position of the first np.nan value?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - what is the most efficient way to find the position of the first np.nan value?

consider the array a

a = np.array([3, 3, np.nan, 3, 3, np.nan])

I could do

np.isnan(a).argmax()

But this requires finding all np.nan just to find the first.
Is there a more efficient way?

I've been trying to figure out if I can pass a parameter to np.argpartition such that np.nan get's sorted first as opposed to last.

EDIT regarding [dup].
There are several reasons this question is different.

That question and answers addressed equality of values. This is in regards to isnan.
Those answers all suffer from the same issue my answer faces. Note, I provided a perfectly valid answer but highlighted it's inefficiency. I'm looking to fix the inefficiency.

EDIT regarding second [dup].

Still addressing equality and question/answers are old and very possibly outdated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:15:10+0000

It might also be worth to look into numba.jit; without it, the vectorized version will likely beat a straight-forward pure-Python search in most scenarios, but after compiling the code, the ordinary search will take the lead, at least in my testing:

In [63]: a = np.array([np.nan if i % 10000 == 9999 else 3 for i in range(100000)])

In [70]: %paste
import numba

def naive(a):
        for i in range(len(a)):
                if np.isnan(a[i]):
                        return i

def short(a):
        return np.isnan(a).argmax()

@numba.jit
def naive_jit(a):
        for i in range(len(a)):
                if np.isnan(a[i]):
                        return i

@numba.jit
def short_jit(a):
        return np.isnan(a).argmax()
## -- End pasted text --

In [71]: %timeit naive(a)
100 loops, best of 3: 7.22 ms per loop

In [72]: %timeit short(a)
The slowest run took 4.59 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 37.7 μs per loop

In [73]: %timeit naive_jit(a)
The slowest run took 6821.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.79 μs per loop

In [74]: %timeit short_jit(a)
The slowest run took 395.51 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 μs per loop

Edit: As pointed out by @hpaulj in their answer, numpy actually ships with an optimized short-circuited search whose performance is comparable with the JITted search above:

In [26]: %paste
def plain(a):
        return a.argmax()

@numba.jit
def plain_jit(a):
        return a.argmax()
## -- End pasted text --

In [35]: %timeit naive(a)
100 loops, best of 3: 7.13 ms per loop

In [36]: %timeit plain(a)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.04 μs per loop

In [37]: %timeit naive_jit(a)
100000 loops, best of 3: 6.91 μs per loop

In [38]: %timeit plain_jit(a)
10000 loops, best of 3: 125 μs per loop

Categories

python - what is the most efficient way to find the position of the first np.nan value?

python - what is the most efficient way to find the position of the first np.nan value?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags