Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
435 views
in Technique[技术] by (71.8m points)

python - count how many elements in a numpy array are within delta of every other element

consider the array x and delta variable d

np.random.seed([3,1415])
x = np.random.randint(100, size=10)
d = 10

For each element in x, I want to count how many other elements in each are within delta d distance away.

So x looks like

print(x)

[11 98 74 90 15 55 13 11 13 26]

The results should be

[5 2 1 2 5 1 5 5 5 1]

what I've tried
Strategy:

  • Use broadcasting to take the outer difference
  • Absolute value of outer difference
  • sum how many exceed threshold

(np.abs(x[:, None] - x) <= d).sum(-1)

[5 2 1 2 5 1 5 5 5 1]

This works great. However, it doesn't scale. That outer difference is O(n^2) time. How can I get the same solution that doesn't scale with quadratic time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Listed in this post are two more variants based on the searchsorted strategy from OP's answer post.

def pir3(a,d):  # Short & less efficient
    sidx = a.argsort()
    p1 = a.searchsorted(a+d,'right',sorter=sidx)
    p2 = a.searchsorted(a-d,sorter=sidx)
    return p1 - p2

def pir4(a, d):   # Long & more efficient
    s = a.argsort()

    y = np.empty(s.size,dtype=np.int64)
    y[s] = np.arange(s.size)

    a_ = a[s]
    return (
        a_.searchsorted(a_ + d, 'right')
        - a_.searchsorted(a_ - d)
    )[y]

The more efficient approach derives the efficient idea to get s.argsort() from this post.

Runtime test -

In [155]: # Inputs
     ...: a = np.random.randint(0,1000000,(10000))
     ...: d = 10


In [156]: %timeit pir2(a,d) #@ piRSquared's post solution
     ...: %timeit pir3(a,d)
     ...: %timeit pir4(a,d)
     ...: 
100 loops, best of 3: 2.43 ms per loop
100 loops, best of 3: 4.44 ms per loop
1000 loops, best of 3: 1.66 ms per loop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...