Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
252 views
in Technique[技术] by (71.8m points)

python 3.x - Speed up getting distance between two lat and lon

I have two DataFrame containing Lat and Lon. I want to find distance from one (Lat, Lon) pair to ALL (Lat, Lon) from another DataFrame and get the minimum. The package that I am using geopy. The code is as follows:

from geopy import distance
import numpy as np

distanceMiles = []
count = 0
for id1, row1 in df1.iterrows():
    target = (row1["LAT"], row1["LON"])
    count = count + 1
    print(count)
    for id2, row2 in df2.iterrows():
        point = (row2["LAT"], row2["LON"])
        distanceMiles.append(distance.distance(target, point).miles)

    closestPoint = np.argmin(distanceMiles)
    distanceMiles = []

The problem is that df1 has 168K rows and df2 has 1200 rows. How do I make it faster?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Leaving this here in case anyone needs it in the future:

If you need only the minimum distance, then you don't have to bruteforce all the pairs. There are some data structures that can help you solve this in O(n*log(n)) time complexity, which is way faster than the bruteforce method.

For example, you can use a generalized KNearestNeighbors (with k=1) algorithm to do exactly that, given that you pay attention to your points being on a sphere, not a plane. See this SO answer for an example implementation using sklearn.

There seems to be a few libraries to solve this too, like sknni and GriSPy.

Here's also another question that talks a bit about the theory.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...