The problem is hard to answer in general because people often have different requirements for what counts as an image match. Some people might want to search for an image that might have a different size or orientation than the template image they provide, in which case a scale- or rotation-invariant approach is needed. There are various option such as looking for similar textures, features or shapes, but I will focus on approaches that only looks for pixels of a similar colour that are in the exact same positions as the template image. This seems most suited to your example which seems to fall in the category of template matching.
Possible approaches
In this case the problem is closely related to the signal processing concepts of cross-correlation and convolution, which is often implemented using an FFT as it is very fast (its in the name!). This is what was used in the approach you linked to, and the FFTW library could be of use when attempting such an implementation as it has wrappers for Java. Using cross-correlation works quite well, as seen in this question, as well as the famous waldo question.
Another option is not to use all the pixels for comparison, but rather only the features that are easier to find and more likely to be unique. This would require a feature descriptor like SIFT, SURF or one of many others. You would need to find all the features in both images and then look for features that have similar positions to those in the template image. With this approach I suggest you use JavaCV.
The random guessing approach you mentioned should work fast when it is possible, but unfortunately it isn't generally applicable as it will only be useful with certain image combinations that produce a close match near the correct location.
Unless you use an external library, the simplest method in Java would be what I would call a brute-force approach, although it is a bit slow. The brute-force approach simply involves searching the whole image for the sub-region that best matches the image you are looking for. I'll explain this approach further. First you need to define how to determine the similarity between two equally sized images. This can be done by summing the differences between the pixel's colours which requires a definition for the difference between RGB values.
Colour similarity
One way of determining the difference between two RGB values is to use the euclidean distance:
sqrt( (r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2 )
There are different colour spaces than RGB that can be used, but since your sub-image is most likely near identical (instead of just visually similar), this should work fine. If you have an ARGB colour space and you don't want semi-transparent pixels to influence your results as much, you can use:
a1 * a2 * sqrt( (r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2 )
which will give a smaller value if the colours have transparency (assuming a1
and a2
are between 0 and 1). I would suggest that you use transparency instead of white areas and to use the PNG file format since it doesn't use lossy compression that subtly distorts the colours in the image.
Comparing images
To compare equally-sized images you can sum the difference between their individual pixels. This sum is then a measure of the difference and you can search for the region in the image with the lowest difference measure. It becomes harder if you don't even know whether the image contains the sub-image, but this would be indicated by the best match having a high difference measure. If you want, you could also normalize the difference measure to lie between 0 and 1 by dividing it by the size of the sub-image and the maximum possible RGB difference (sqrt(3) with the euclidean distance and RGB values from 0 to 1). Zero would then be an identical match and anything close to one would be as different as possible.
Brute-force implementation
Here's a simple implementation that uses the brute-force approach to search the image. With your example images it found the location at (139,55) to be the top-left location of the region with the best match (which looks correct). It took about 10 to 15 seconds to run on my PC and the normalized difference measure of the location was around 0.57.
/**
* Finds the a region in one image that best matches another, smaller, image.
*/
public static int[] findSubimage(BufferedImage im1, BufferedImage im2){
int w1 = im1.getWidth(); int h1 = im1.getHeight();
int w2 = im2.getWidth(); int h2 = im2.getHeight();
assert(w2 <= w1 && h2 <= h1);
// will keep track of best position found
int bestX = 0; int bestY = 0; double lowestDiff = Double.POSITIVE_INFINITY;
// brute-force search through whole image (slow...)
for(int x = 0;x < w1-w2;x++){
for(int y = 0;y < h1-h2;y++){
double comp = compareImages(im1.getSubimage(x,y,w2,h2),im2);
if(comp < lowestDiff){
bestX = x; bestY = y; lowestDiff = comp;
}
}
}
// output similarity measure from 0 to 1, with 0 being identical
System.out.println(lowestDiff);
// return best location
return new int[]{bestX,bestY};
}
/**
* Determines how different two identically sized regions are.
*/
public static double compareImages(BufferedImage im1, BufferedImage im2){
assert(im1.getHeight() == im2.getHeight() && im1.getWidth() == im2.getWidth());
double variation = 0.0;
for(int x = 0;x < im1.getWidth();x++){
for(int y = 0;y < im1.getHeight();y++){
variation += compareARGB(im1.getRGB(x,y),im2.getRGB(x,y))/Math.sqrt(3);
}
}
return variation/(im1.getWidth()*im1.getHeight());
}
/**
* Calculates the difference between two ARGB colours (BufferedImage.TYPE_INT_ARGB).
*/
public static double compareARGB(int rgb1, int rgb2){
double r1 = ((rgb1 >> 16) & 0xFF)/255.0; double r2 = ((rgb2 >> 16) & 0xFF)/255.0;
double g1 = ((rgb1 >> 8) & 0xFF)/255.0; double g2 = ((rgb2 >> 8) & 0xFF)/255.0;
double b1 = (rgb1 & 0xFF)/255.0; double b2 = (rgb2 & 0xFF)/255.0;
double a1 = ((rgb1 >> 24) & 0xFF)/255.0; double a2 = ((rgb2 >> 24) & 0xFF)/255.0;
// if there is transparency, the alpha values will make difference smaller
return a1*a2*Math.sqrt((r1-r2)*(r1-r2) + (g1-g2)*(g1-g2) + (b1-b2)*(b1-b2));
}
I haven't looked, but maybe one of these Java image processing libraries could also be of some use:
If speed is really important I think the best approach would be an implementation using cross-correlation or feature descriptors that uses an external library.