Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

Determine Which Duplicate Images to Remove using Python Dictionary

I've written a script that identifies duplicate and near-duplicate images based on some criteria. The results are placed in a dictionary where the keys represent an image and the values are duplicate images. For example, for image0 there are duplicate images 1-5. Now, I'm trying to make a list of candidates to delete based on my dictionary. I'd like to keep the first image that appears in the dictionary (image0), delete images 1-5, and then skip keys 1-5 because those images have already been removed. How would I do this? Or is there a better way to go about identifying candidates for deletion?

Example Dictionary:

{0: [1, 2, 3, 4, 5],
 1: [0, 2, 3, 4, 5],
 2: [0, 1, 3, 4, 5],
 3: [0, 1, 2, 4, 5],
 4: [0, 1, 2, 3, 5],
 5: [0, 1, 2, 3, 4],
 6: [7, 8, 9, 10, 11],
 7: [6, 8, 9, 10, 11],
 8: [6, 7, 9, 10, 11],
 9: [6, 7, 8, 10, 11],
 10: [6, 7, 8, 9, 11],
 11: [6, 7, 8, 9, 10],
 12: [13, 14, 15, 16, 17],
 13: [12, 14, 15, 16, 17],
 14: [12, 13, 15, 16, 17],
 15: [12, 13, 14, 16, 17],
 16: [12, 13, 14, 15, 17],
 17: [12, 13, 14, 15, 16],
 18: [19, 20, 21, 22, 23],
 19: [18, 20, 21, 22, 23],
 20: [18, 19, 21, 22, 23],
 21: [18, 19, 20, 22, 23],
 22: [18, 19, 20, 21, 23],
 23: [18, 19, 20, 21, 22],
 24: [25, 26, 27, 28, 29],
 25: [24, 26, 27, 28, 29],
 26: [24, 25, 27, 28, 29],
 27: [24, 25, 26, 28, 29],
 28: [24, 25, 26, 27, 29],
 29: [24, 25, 26, 27, 28],
 30: [31, 32, 33, 34, 35],
 31: [30, 32, 33, 34, 35],
 32: [30, 31, 33, 34, 35],
 33: [30, 31, 32, 34, 35],
 34: [30, 31, 32, 33, 35],
 35: [30, 31, 32, 33, 34],
 36: [37, 38, 39],
 37: [36, 38, 39],
 38: [36, 37, 39],
 39: [36, 37, 38],
 40: [41, 42, 43],
 41: [40, 42, 43],
 42: [40, 41, 43],
 43: [40, 41, 42],
 44: [45, 46],
 45: [44, 46],
 46: [44, 45]}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The logic of what you want to do is very straightforward. You can keep a set of delete candidates. You iterate over the keys in the dict. For each key, you look for it in the set. If it's there, then you don't want to process it because it has already been determined to be a key you want to delete. If it isn't there, then the value in that key's dictionary contains a list of keys that you want to delete, and so you add all of those keys to the list of delete candidates.

If you really want a list as the result, at the end you can convert the set to a list.

Here's the code to do that:

dups = set()

for i in data:
    if i not in dups:
        dups = dups.union(data[i])

print(list(dups))

Result:

[1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 43, 45, 46]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...