Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
337 views
in Technique[技术] by (71.8m points)

r - Closest point to a path

I have two sets of points, called path and centers. For each point in path, I would like an efficient method for finding the ID of the closest point in centers. I would like to do this in R. Below is a simple reproducible example.

set.seed(1)
n <- 10000
x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002))
y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002))

path <- data.frame(cbind(x=x, y=y))

centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001), 
                       y=seq(0, 500, by=0.2) + rnorm(2501))

centers$id <- seq(nrow(centers))

x and y are coordinates. I would like to add a column to the path data.frame that has the id of the closest center for the given x and y co-ordinate. I then want to get all of the unique ids.

My solution at the moment does work, but is very slow when the scale of the problem increases. I would like something much more efficient.

path$closest.id <- sapply(seq(nrow(path)), function(z){
   tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2)
   as.numeric(centers[tmp == min(tmp), 'id'])
})

output <- unique(path$closest.id)

Any help on speeding this up would be greatly appreciated.

I think data.table might help, but ideally what I am looking for is an algorithm that is perhaps a bit smarter in terms of the search, i.e. instead of calculating the distances to each center and then only selecting the minimum one... to get the id...

I am also happy to use Rcpp/Rcpp11 as well if that would help improve performance.

My minimum acceptable time to perform this kind of calculation out would be 10 seconds, but obviously faster would be better.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do this with nn2 from the RANN package. On my system, this computes the nearest center to each of your path points in under 2 seconds.

library(RANN)
system.time(closest <- nn2(centers[, 1:2], path, 1))

#   user  system elapsed 
#   1.41    0.14    1.55 



sapply(closest, head)

#      nn.idx   nn.dists
# [1,] 247451 0.20334929
# [2,] 250454 0.12326323
# [3,] 250454 0.28540127
# [4,] 253457 0.05178687
# [5,] 253457 0.13324137
# [6,] 253457 0.09009626

Here's another example with 2.5 million candidate points that all fall within the extent of the path points (in your example, the centers have a much larger x and y range than do the path points). It's a little slower in this case.

set.seed(1)
centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y)))
system.time(closest2 <- nn2(centers2, path, 1))

#   user  system elapsed 
#   2.96    0.11    3.07 

sapply(closest2, head)

#       nn.idx    nn.dists
# [1,]  730127 0.025803703
# [2,]  375514 0.025999069
# [3,] 2443707 0.047259283
# [4,]   62780 0.022747930
# [5,] 1431847 0.002482623
# [6,] 2199405 0.028815865

This can be compared to the output using sp::spDistsN1 (which is much slower for this problem):

library(sp)
apply(head(path), 1, function(x) which.min(spDistsN1(centers, x)))

#       1       2       3       4       5       6 
#  730127  375514 2443707   62780 1431847 2199405 

Adding the point id to the path data.frame and reducing to unique values is trivial:

path$closest.id <- closest$nn.idx
output <- unique(path$closest.id)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...