(Why am I doing this? See explanation below)
Consider two sets of points, A
and B
as shown below
It might not look like it, but set A
is "hidden" within set B
. It can not be easily seen because points in B
are scaled, rotated, and translated in (x, y)
with respect to A
. Even worse, some points that are present in A
are missing in B
, and B
contains many points that are not in A
.
I need to find the appropriate scaling, rotation, and translation that must be applied to the B
set in order to match it with set A
. In the case shown above, the correct values are:
scale = 0.14, rot_angle = 0.0, x_transl = 35.0, y_transl = 2.0
which produces the (good enough) match
(in red, only the matched B
points are shown; these are located in the sector 1000<x<2000, y~2000
in the first figure to the right). Given the many degrees of freedom (DoF: scaling + rotation + 2D translation) I am aware of the possibility of a miss-match, but the coordinates of the points are not random (although they may look like it) so the probability of this happening is very small.
The code I wrote (see below) uses brute force to loop through all possible DoF values taken from pre-defined ranges for each one. The core of the code is based on minimizing the distance of each point in A
to any point in B
The code works (it actually generated the solution mentioned above), but since the number of solutions (i.e., the combinations of accepted values for each DoF) scales with larger ranges, it can become unacceptably slow pretty quick (also it eats up all the RAM in my system)
How can I improve the performance of the code? I'm open to any solution including numpy
and/or scipy
. Perhaps something like Basing-Hopping to search for the best match (or a relatively close one) instead of the brute force method I currently employ?
import numpy as np
from scipy.spatial import distance
import math
def scalePoints(B_center, delta_x, delta_y, scale):
"""
Scales xy points.
http://codereview.stackexchange.com/q/159183/35351
"""
x_scale = B_center[0] - scale * delta_x
y_scale = B_center[1] - scale * delta_y
return x_scale, y_scale
def rotatePoints(center, x, y, angle):
"""
Rotates points in 'xy' around 'center'. Angle is in degrees.
Rotation is counter-clockwise
http://stackoverflow.com/a/20024348/1391441
"""
angle = math.radians(angle)
xy_rot = x - center[0], y - center[1]
xy_rot = (xy_rot[0] * math.cos(angle) - xy_rot[1] * math.sin(angle),
xy_rot[0] * math.sin(angle) + xy_rot[1] * math.cos(angle))
xy_rot = xy_rot[0] + center[0], xy_rot[1] + center[1]
return xy_rot
def distancePoints(set_A, x_transl, y_transl):
"""
Find the sum of the minimum distance of points in set_A to points in set_B.
"""
d = distance.cdist(set_A, zip(*[x_transl, y_transl]), 'euclidean')
# Sum of all minimal distances.
d_sum = np.sum(np.min(d, axis=1))
return d_sum
def match_frames(
set_A, B_center, delta_x, delta_y, tol, sc_min, sc_max, sc_step,
ang_min, ang_max, ang_step, xmin, xmax, xstep, ymin, ymax, ystep):
"""
Process all possible solutions in the defined ranges.
"""
N = len(set_A)
# Ranges
sc_range = np.arange(sc_min, sc_max, sc_step)
ang_range = np.arange(ang_min, ang_max, ang_step)
x_range = np.arange(xmin, xmax, xstep)
y_range = np.arange(ymin, ymax, ystep)
print("Total solutions: {:.2e}".format(
np.prod([len(_) for _ in [sc_range, ang_range, x_range, y_range]])))
d_sum, params_all = [], []
for scale in sc_range:
# Scaled points.
x_scale, y_scale = scalePoints(B_center, delta_x, delta_y, scale)
for ang in ang_range:
# Rotated points.
xy_rot = rotatePoints(B_center, x_scale, y_scale, ang)
# x translation
for x_tr in x_range:
x_transl = xy_rot[0] + x_tr
# y translation
for y_tr in y_range:
y_transl = xy_rot[1] + y_tr
# Find minimum distance sum.
d_sum.append(distancePoints(set_A, x_transl, y_transl))
# Store solutions.
params_all.append([scale, ang, x_tr, y_tr])
# Condition to break out if given tolerance for match
# is achieved.
if d_sum[-1] <= tol * N:
print("Match found:", scale, ang, x_tr, y_tr)
i_min = d_sum.index(min(d_sum))
return i_min, params_all
# Print best solution found so far.
i_min = d_sum.index(min(d_sum))
print("d_sum_min = {:.2f}".format(d_sum[i_min]))
return i_min, params_all
# Data.
set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365],
[1964.91, 1994.565], [1894.41, 1957.065]]
set_B = [
[2689.28, 3507.04, 2895.67, 1051.3, 1929.49, 1035.97, 752.44, 130.62,
620.06, 2769.06, 1580.77, 281.76, 224.54, 3848.3],
[2061.19, 3700.27, 2131.2, 1837.3, 2017.52, 80.96, 3524.61, 3821.22,
3711.53, 1812.12, 1868.33, 3865.77, 3273.77, 2100.71]]
# This is necessary to apply the scaling.
x, y = np.asarray(set_B)
# Center of B points, defined as the center of the minimal rectangle that
# contains all points.
B_center = [(min(x) + max(x)) * .5, (min(y) + max(y)) * .5]
# Difference between the center coordinates and the xy points.
delta_x, delta_y = B_center[0] - x, B_center[1] - y
# Tolerance in pixels for match.
tol = 1.
# Ranges for each DoF.
sc_min, sc_max, sc_step = .01, .2, .01
ang_min, ang_max, ang_step = -30., 30., 1.
xmin, xmax, xstep = -150., 150., 1.
ymin, ymax, ystep = -150., 150., 1.
# Find proper scaling + rotation + translation for set_B.
i_min, params_all = match_frames(
set_A, B_center, delta_x, delta_y, tol, sc_min, sc_max, sc_step,
ang_min, ang_max, ang_step, xmin, xmax, xstep, ymin, ymax, ystep)
# Best match found
print(params_all[i_min])
Why am I doing this?
When an astronomer observes a star field, it also needs to observe what is called a "standard field of stars". This is needed to be able to transform the "instrumental magnitudes" (a logarithmic measure of the brightness) of the observed stars to a common universal scale, since these magnitudes depend on the optical system used (telescope + CCD array). In the example shown here, the standard field can be seen below to the left, and the observed one to the right.
Notice that the points in the set A
(used above) are the marked stars in the standard field, and the set B
are those stars detected in the observed field (marked in red above)
The process of identifying those stars in the observed field that correspond to the stars marked in the standard field is done by-eye, even today. This is due to the complexity of the task.
In the observed image above, there's quite a bit of scaling, but no rotation and little translation. This constitutes a rather favorable scenario; it could be much worse. I'm trying to develop a simple algorithm to avoid having to manually identify stars in the observed field as stars in the standard field, one by one.
Solution proposed by litepresence
This is a script I made following the answer by litepresence.
import itertools
import numpy as np
import matplotlib.pyplot as plt
def getTriangles(set_X, X_combs):
"""
Inefficient way of obtaining the lengths of each triangle's side.
Normalized so that the minimum length is 1.
"""
triang = []
for p0, p1, p2 in X_combs:
d1 = np.sqrt((set_X[p0][0] - set_X[p1][0]) ** 2 +
(set_X[p0][1] - set_X[p1][1]) ** 2)
d2 = np.sqrt((set_X[p0][0] - set_X[p2][0]) ** 2 +
(set_X[p0][1] - set_X[p2][1]) ** 2)
d3 = np.sqrt((set_X[p1][0] - set_X[p2][0]) ** 2 +
(set_X[p1][1] - set_X[p2][1]) ** 2)
d_min = min(d1, d2, d3)
d_unsort = [d1 / d_min, d2 / d_min, d3 / d_min]
triang.append(sorted(d_unsort))
return triang
def sumTriangles(A_triang, B_triang):
"""
For each normalized triangle in A, compare with each normalized triangle
in B. find the differences between their sides, sum their absolute values,
and select the two triangles with the smallest sum of absolute differences.
"""
tr_sum, tr_idx = [], []
for i, A_tr in enumerate(A_triang):
for j, B_tr in enumerate(B_triang):
# Absolute value of lengths differences.
tr_diff = abs(np.array(A_tr) - np.array(B_tr))
# Sum the differences
tr_sum.append(sum(tr_diff))
tr_idx.append([i, j])
# Index of the triangles in A and B with the smallest sum of absolute
# length differences.
tr_idx_min = tr_idx[tr_sum.index(min(tr_sum))]
A_idx, B_idx = tr_idx_min[0], tr_idx_min[1]
print("Smallest difference: {}".format(min(tr_sum)))
return A_idx, B_idx
# Data.
set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365],
[1964.91, 1994.565], [1894.41, 1957.065]]
set_B = [
[2689.28, 3507.04, 2895.67, 1051.3, 1929.49, 1035.97, 752.44, 130.62,
620.06, 2769.06, 1580.77, 281.76, 224.54, 3848.3],
[2061.19, 3700.27, 2131.2, 1837.3, 2017.52, 80.96, 3524.61, 3821.22,
3711.53, 1812.12, 1868.33, 3865.77, 3273.77, 2100.71]]
set_B = zip(*set_B)
# All possible triangles.
A_combs = list(itertools.combinations(range(len(set_A)), 3))
B_combs = list(itertools.combinations(range(len(set_B)), 3))
# Obtain normalized triangles.
A_triang, B_triang = getTriangles(set_A, A_combs), getTriangles(set_B, B_combs)
# Index of the A and B triangles with the smallest difference.
A_idx, B_idx = sumTriangles(A_triang, B_triang)
# Indexes of points in A and B of the best match triangles.
A_idx_pts, B_idx_pts = A_combs[A_idx], B_combs[B_idx]
print 'triangle A %s matches triangle B %s' % (A_idx_pts, B_idx_pts)
# Matched points in A and B.
print "A:", [set_A[_] for _ in A_idx_pts]
print "B:", [set_B[_] for _ in B_idx_pts]
# Plot
A_pts = zip(*[set_A[_] for _ in A_idx_pts])
B_pts = zip(*[set_B[_] for _ in B_idx_pts])
plt.scatter(*A_pts, s=10, c='k')
plt.scatter(*B_pts, s=10, c='r')
plt.show()
The method is almost instantaneous and produces the correct match:
Smallest difference: 0.0314154749597
triangle A (0, 1, 4) matches triangle B (3, 4, 10)
A: [[2015.81, 1981.665], [1967.31, 1960.865], [1894.41, 1957.065]]
B: [(1051.3, 1837.3), (1929.49, 2017.52), (1580.77, 1868.33)]
<a href="https://i.s