Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
318 views
in Technique[技术] by (71.8m points)

performance - Determining index each group duplicate values in an array in Python with the fastest way

I want to find an index of each group duplicate value like this:

s = [2,6,2,88,6,...]

The results must return the index from original s: [[0,2],[1,4],..] or the result can show another way.

I find many solutions so I find the fastest way to get duplicate group:

s = np.sort(a, axis=None)
s[:-1][s[1:] == s[:-1]]

But after sort I got wrong index from original s.

In my case, I have ~ 200mil value on the list and I want to find the fastest way to do that. I use an array to store value because I want to use GPU to make it faster.

question from:https://stackoverflow.com/questions/65878826/determining-index-each-group-duplicate-values-in-an-array-in-python-with-the-fas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using hash structures like dict helps.

For example:

import numpy as np
from collections import defaultdict

a=np.array([2,4,2,88,15,4])
table=defaultdict(list)
for ind,num in enumerate(a):
    table[num]+=[ind]

Outputs:

{2: [0, 2], 4: [1, 5], 88: [3], 15: [4]}

If you want to show duplicated elements in the order from small to large:

for k,v in sorted(table.items()):
    if len(v)>1:
        print(k,":",v)

Outputs:

2 : [0, 2]
4 : [1, 5]

The speed is determined by how many different values in the number list.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...