Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
626 views
in Technique[技术] by (71.8m points)

python - How can I introduce the radio in query radius-BallTree sklearn, radians, km?

I'm working with latitude and longitude data. I've used BallTree because I have many rows (32000 rows) in the dataset. If I built the tree with haversine distance:

'''' model_BTree = BallTree(np.array(points_sec_rad),metric='haversine') ''''

and I transform the latitude and longitude to radians units, how can I apply query_radius (max_dist_rad) to the points which I would like to locate? I've used 0.150 meters as radius but I'm not sure if I should use an approximation in radians.

''''ind_BTree,dist_BTree = model_BTree.query_radius(np.array(points_loc_rad), r=max_dist_rad, return_distance = True, sort_results=True) ''''

Also, how can I limit the number of neighbors inside the radio? thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

edit: Example with working code and explanation

Best way to visualise what is happening with appying the haversine distance, is by visualise that all great circle distances are measured on a small pingpong sphere.

If you want apply query_radius() on larger spheres, like earth, you need to convert the earthy km/miles back to the unit pingpong sphere. Say you want 100 miles, you need to divide by the earth radius in miles. The output of query_radius() needs to be transformed back to miles/km again by multiplying.

Say we have the following towns and museum data in Pandas:

import pandas as pd
import numpy as np

from sklearn.neighbors import BallTree
towns = pd.DataFrame({
    "name" : ["Merry Hill", "Spring Valley", "Nesconset"],
    "lat" : [36.01, 41.32, 40.84],
    "long" : [-76.7, -89.20, -73.15]
})

museum = pd.DataFrame({
    "name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
    "lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
    "long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})

Than we need to extract the lat/long pairs as numpy arrays with

places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values

Now we can create the ball tree with

places_radians =  np.radians(places_gps)
museum_radians = np.radians(museum_gps)

tree = BallTree(museum_radians, leaf_size=15, metric='haversine')

Again, imagine this little ball is just the size of a pingpong ball. To use them for larger/smaller spheres we need to multiply/divide.

Say I want all museum within 100 miles;

distance_in_miles = 100
earth_radius_in_miles = 3958.8
    
radius = distance_in_miles / earth_radius_in_miles

Now I can apply query_radius(), and remember the returned distances need to be converted back to miles. The distances here are the great circle distance on the unit sphere, our pingpong ball.

is_within, distances = tree.query_radius(places_radians, r=radius, count_only=False, return_distance=True) 

so we

distances_in_miles = distances * earth_radius_in_miles

Lets check the output and we see that distances_in_miles

array([array([], dtype=float64), array([], dtype=float64),
       array([42.68960475])], dtype=object)

Which translate to that 'Nesconset' should be < 100 Miles from 'The Metropolitan Museum of Art', and- that this distance is around 42.689 Miles. Notice indeed only a distance is returned for the last array (Nesconset), and with help of is_within we find the index of the museum within in 5, which is museum.name[5], 'The Metropolitan Museum of Art'.

Depending on the method of checking, it won't be exact 42.689 miles, but a quick check with Google maps confirms it is around that range. The earth is simply not a perfect sphere so there will be errors.

Like my original post, errors are easily made, in forgetting to apply the correction factor, swap lat/long values, or km/meters.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...