Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

spam prevention - What is the best way to programmatically detect porn images?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.

#!python    
import os, glob
from PIL import Image

def get_skin_ratio(im):
    im = im.crop((int(im.size[0]*0.2), int(im.size[1]*0.2), im.size[0]-int(im.size[0]*0.2), im.size[1]-int(im.size[1]*0.2)))
    skin = sum([count for count, rgb in im.getcolors(im.size[0]*im.size[1]) if rgb[0]>60 and rgb[1]<(rgb[0]*0.85) and rgb[2]<(rgb[0]*0.7) and rgb[1]>(rgb[0]*0.4) and rgb[2]>(rgb[0]*0.2)])
    return float(skin)/float(im.size[0]*im.size[1])

for image_dir in ('porn','clean'):
    for image_file in glob.glob(os.path.join(image_dir,"*.jpg")):
        skin_percent = get_skin_ratio(Image.open(image_file)) * 100
        if skin_percent>30:
            print "PORN {0} has {1:.0f}% skin".format(image_file, skin_percent)
        else:
            print "CLEAN {0} has {1:.0f}% skin".format(image_file, skin_percent)

This code measures skin tones in the center of the image. I've tested on 20 relatively tame "porn" images and 20 completely innocent images. It flags 100% of the "porn" and 4 out of the 20 of the clean images. That's a pretty high false positive rate but the script aims to be fairly cautious and could be further tuned. It works on light, dark and Asian skin tones.

It's main weaknesses with false positives are brown objects like sand and wood and of course it doesn't know the difference between "naughty" and "nice" flesh (like face shots).

Weakness with false negatives would be images without much exposed flesh (like leather bondage), painted or tattooed skin, B&W images, etc.

source code and sample images


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...