Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
436 views
in Technique[技术] by (71.8m points)

python - Trying to parallelize osmnx.get_nearest_node(..) function with several ways st. multiprocessing, joblib but none works

I have an osmnx graph and about 6m query coordinates to find the nearest nodes within the graph. I want to parallelize the tasks and I tried joblib and the method described for shortest path [https://github.com/gboeing/osmnx-examples/blob/v0.16.0/notebooks/02-routing-speed-time.ipynb][1]. However both trials, I am using only 1 CPU out of 15 available.

I don't know much about multiprocessing and just tried to imitate the solution for shortest path. Am I doing something wrong or the method is not applicable to the osmnx.get_nearest_node(..) function? Are there any other suggestions to speed up the process?

import pandas as pd
import osmnx as ox
ox.config(use_cache=True, log_console=True)

G = ox.graph_from_bbox(38.289, 36.898, -122.704,  -121.214 , 
                         network_type='drive',
                         retain_all = False,
                         simplify=True)

origins = [(37.775, -122.216),(37.458, -121.913),(37.558, -122.258), (37.791, -122.413),
            (37.775, -122.219),(37.773, -121.952),(37.773, -121.926),
            (37.332, -122.003),(37.462, -122.228),(37.701, -122.089),
            (37.696, -122.143),(37.931, -122.323),(37.558, -122.273),
            (37.357, -121.902),(37.462, -122.228),(37.791, -122.416),
            (37.922, -122.368),(37.551, -122.291),(37.701, -122.088),
            (37.802, -122.267),(38.015, -122.015),(37.701, -122.088),
            (37.503, -122.267), (37.791, -122.416), (37.551, -122.301),
            (37.405, -122.061), (37.228, -121.877), (37.326, -121.814), 
            (37.292, -122.032), (37.722, -122.15), (37.966, -122.507),
            (37.773, -121.989), (37.294, -121.895), (37.881, -122.127),
            (37.872, -122.14), (37.551, -122.307), (37.404, -121.976),
            (37.775, -122.209), (37.791, -122.413), (37.228, -121.878)]

import multiprocessing as mp 


def nn(G,origin):
    try: 
        return ox.get_nearest_node(G, origin)
    except:
        return np.nan
    
params = ((G, orig) for orig in origins)
pool = mp.Pool(15)
sma = pool.starmap_async(nn, params)

routes = sma.get()
pool.close()
pool.join()

Also an interesting detail, when I timed it I get following result:

CPU times: user 3min 58s, sys: 8.91 s, total: 4min 6s Wall time: 4min 15s

When I use a simple for loop with the same graph and points its much faster:

for origin in origins:
    r = ox.get_nearest_node(G, origin)

CPU times: user 10.8 s, sys: 39.9 ms, total: 10.8 s Wall time: 10.8 s [1]: https://github.com/gboeing/osmnx-examples/blob/v0.16.0/notebooks/02-routing-speed-time.ipynb

question from:https://stackoverflow.com/questions/65894491/trying-to-parallelize-osmnx-get-nearest-node-function-with-several-ways-st

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I just noticed, using osmnx.get_nearest_nodes() is taking seconds for the 6m queries. I decided to use that function instead.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...