Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.3k views
in Technique[技术] by (71.8m points)

python - How to remove duplicates in a numpy array and keep its sorting

I have a list of numpy arrays and want to remove duplicates and also keep the order of my sorted data. This is my array with duplicates:

dup_arr=[np.array([[0., 10., 10.],
                   [0., 2., 30.],
                   [0., 3., 5.],
                   [0., 3., 5.],
                   [0., 3., 40.]]),
         np.array([[0., -1., -4.],
                   [0., -2., -3.],
                   [0., -3., -5.],
                   [0., -3., -6.],
                   [0., -3., -6.]])]

I tried to do it using the following code:

clean_arr=[]
for i in dup_arr:
    new_array = [tuple(row) for row in i]
    uniques = np.unique(new_array, axis=0)
    clean_arr.append(uniques)

But the problem of this method is that it changes the sort of my data and I do not want to to sort them again because it is a tough task for my real data. I want to have the following result:

clean_arr=[np.array([[0., 10., 10.],
                     [0., 2., 30.],
                     [0., 3., 5.],
                     [0., 3., 40.]]),
           np.array([[0., -1., -4.],
                     [0., -2., -3.],
                     [0., -3., -5.],
                     [0., -3., -6.]])]

But the code shuffle it. I also tried the foolowing for loops but it was not also successful because I can not iterate until the end of my data and stop the second for loop before reaching to the end of each array of my list.

clean_arr=[]
for arrays in dup_arr:
    for rows in range (len(arrays)-1):
        if np.all(arrays [rows]== arrays [rows+1]):
            continue
        else:
            dat= arrays [rows]
            clean_arr.append(dat)

In advance, I do appreciate any help and contribution.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can simply use np.unique with axis=0. If you want to keep the order from the original sequence try this -

[i[np.sort(np.unique(i, axis=0, return_index=True)[1])] for i in dup_arr]
[array([[ 0., 10., 10.],
        [ 0.,  2., 30.],
        [ 0.,  3.,  5.],
        [ 0.,  3., 40.]]),
 array([[ 0., -1., -4.],
        [ 0., -2., -3.],
        [ 0., -3., -5.],
        [ 0., -3., -6.]])]
  1. np.unique(i, axis=0, return_index=True)[1] returns the indexes of the unique elements.
  2. np.sort() sorts these indexes back to original sequence in array.
  3. [f(i) for i in dup_arr] applies the above 2 steps over each element in dup_arr.

NOTE: You will NOT be able to completely vectorize this operation (say by np.stack on this operations since it will may have variable duplicates removed from each matrix. This will cause the numpy array to have unequal shapes over an axis.


Breaking the steps as a function -

def f(a):
    indexes = np.unique(a, axis=0, return_index=True)[1]
    return a[np.sort(indexes)]

[f(i) for i in dup_arr]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...