edit: Example with working code and explanation
Best way to visualise what is happening with appying the haversine distance, is by visualise that all great circle distances are measured on a small pingpong sphere.
If you want apply query_radius()
on larger spheres, like earth, you need to convert the earthy km/miles back to the unit pingpong sphere. Say you want 100 miles, you need to divide by the earth radius in miles. The output of query_radius()
needs to be transformed back to miles/km again by multiplying.
Say we have the following towns and museum data in Pandas:
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
towns = pd.DataFrame({
"name" : ["Merry Hill", "Spring Valley", "Nesconset"],
"lat" : [36.01, 41.32, 40.84],
"long" : [-76.7, -89.20, -73.15]
})
museum = pd.DataFrame({
"name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
"lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
"long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})
Than we need to extract the lat/long pairs as numpy
arrays with
places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values
Now we can create the ball tree with
places_radians = np.radians(places_gps)
museum_radians = np.radians(museum_gps)
tree = BallTree(museum_radians, leaf_size=15, metric='haversine')
Again, imagine this little ball is just the size of a pingpong ball. To use them for larger/smaller spheres we need to multiply/divide.
Say I want all museum within 100 miles;
distance_in_miles = 100
earth_radius_in_miles = 3958.8
radius = distance_in_miles / earth_radius_in_miles
Now I can apply query_radius()
, and remember the returned distances need to be converted back to miles. The distances
here are the great circle distance on the unit sphere, our pingpong ball.
is_within, distances = tree.query_radius(places_radians, r=radius, count_only=False, return_distance=True)
so we
distances_in_miles = distances * earth_radius_in_miles
Lets check the output and we see that distances_in_miles
array([array([], dtype=float64), array([], dtype=float64),
array([42.68960475])], dtype=object)
Which translate to that 'Nesconset' should be < 100 Miles from 'The Metropolitan Museum of Art', and- that this distance is around 42.689 Miles. Notice indeed only a distance is returned for the last array (Nesconset), and with help of is_within
we find the index of the museum within in 5, which is museum.name[5]
, 'The Metropolitan Museum of Art'.
Depending on the method of checking, it won't be exact 42.689 miles, but a quick check with Google maps confirms it is around that range. The earth is simply not a perfect sphere so there will be errors.
Like my original post, errors are easily made, in forgetting to apply the correction factor, swap lat/long values, or km/meters.