Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
259 views
in Technique[技术] by (71.8m points)

Compare two images the python/linux way

Trying to solve a problem of preventing duplicate images to be uploaded.

I have two JPGs. Looking at them I can see that they are in fact identical. But for some reason they have different file size (one is pulled from a backup, the other is another upload) and so they have a different md5 checksum.

How can I efficiently and confidently compare two images in the same sense as a human would be able to see that they are clearly identical?

Example: http://static.peterbe.com/a.jpg and http://static.peterbe.com/b.jpg

Update

I wrote this script:

import math, operator
from PIL import Image
def compare(file1, file2):
    image1 = Image.open(file1)
    image2 = Image.open(file2)
    h1 = image1.histogram()
    h2 = image2.histogram()
    rms = math.sqrt(reduce(operator.add,
                           map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
    return rms

if __name__=='__main__':
    import sys
    file1, file2 = sys.argv[1:]
    print compare(file1, file2)

Then I downloaded the two visually identical images and ran the script. Output:

58.9830484122

Can anybody tell me what a suitable cutoff should be?

Update II

The difference between a.jpg and b.jpg is that the second one has been saved with PIL:

b=Image.open('a.jpg')
b.save(open('b.jpg','wb'))

This apparently applies some very very light quality modifications. I've now solved my problem by applying the same PIL save to the file being uploaded without doing anything with it and it now works!

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a OSS project that uses WebDriver to take screen shots and then compares the images to see if there are any issues (http://code.google.com/p/fighting-layout-bugs/)). It does it by openning the file into a stream and then comparing every bit.

You may be able to do something similar with PIL.

EDIT:

After more research I found

h1 = Image.open("image1").histogram()
h2 = Image.open("image2").histogram()

rms = math.sqrt(reduce(operator.add,
    map(lambda a,b: (a-b)**2, h1, h2))/len(h1))

on http://snipplr.com/view/757/compare-two-pil-images-in-python/ and http://effbot.org/zone/pil-comparing-images.htm


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...