python - Return array of counts for each feature of input

Question

Welcome To Ask or Share your Answers For Others

python - Return array of counts for each feature of input

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Return array of counts for each feature of input

I have an array of integer labels and I would like to determine how many of each label is present and store those values in an array of the same size as the input. This can be accomplished with the following loop:

def counter(labels):
    sizes = numpy.zeros(labels.shape)
    for num in numpy.unique(labels):
        mask = labels == num
        sizes[mask] = numpy.count_nonzero(mask)
return sizes

with input:

array = numpy.array([
       [0, 1, 2, 3],
       [0, 1, 1, 3],
       [3, 1, 3, 1]])

counter() returns:

array([[ 2.,  5.,  1.,  4.],
       [ 2.,  5.,  5.,  4.],
       [ 4.,  5.,  4.,  5.]])

However, for large arrays, with many unique labels, 60,000 in my case, this takes a considerable amount time. This is the first step in a complex algorithm and I can't afford to spend more than about 30 seconds on this step. Is there a function that already exists that can accomplish this? If not, how can I speed up the existing loop?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:31+0000

Approach #1

Here's one using np.unique -

_, tags, count = np.unique(labels, return_counts=1, return_inverse=1)
sizes = count[tags]

Approach #2

With positive numbers in labels, simpler and more efficient way with np.bincount -

sizes = np.bincount(labels)[labels]

Runtime test

Setup with 60,000 unique positive numbers and two such sets of lengths 100,000 and 1000,000 are timed.

Set #1 :

In [192]: np.random.seed(0)
     ...: labels = np.random.randint(0,60000,(100000))

In [193]: %%timeit
     ...: sizes = np.zeros(labels.shape)
     ...: for num in np.unique(labels):
     ...:     mask = labels == num
     ...:     sizes[mask] = np.count_nonzero(mask)
1 loop, best of 3: 2.32 s per loop

In [194]: %timeit np.bincount(labels)[labels]
1000 loops, best of 3: 376 μs per loop

In [195]: 2320/0.376 # Speedup figure
Out[195]: 6170.212765957447

Set #2 :

In [196]: np.random.seed(0)
     ...: labels = np.random.randint(0,60000,(1000000))

In [197]: %%timeit
     ...: sizes = np.zeros(labels.shape)
     ...: for num in np.unique(labels):
     ...:     mask = labels == num
     ...:     sizes[mask] = np.count_nonzero(mask)
1 loop, best of 3: 43.6 s per loop

In [198]: %timeit np.bincount(labels)[labels]
100 loops, best of 3: 5.15 ms per loop

In [199]: 43600/5.15 # Speedup figure
Out[199]: 8466.019417475727

Categories

python - Return array of counts for each feature of input

python - Return array of counts for each feature of input

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags