Let's consider what your query is doing:
MATCH (p1:SearchableNode {name: "Ishaan"}),
(p2:SearchableNode {name: "Garima"}),
path = (p1)-[:NAVIGATE_TO*]-(p2)
RETURN path
If you run this query in the console with EXPLAIN
in front of it, the DB will give you its plan for how it will answer. When I did this, the query compiler warned me:
If a part of a query contains multiple disconnected patterns, this
will build a cartesian product between all those parts. This may
produce a large amount of data and slow down query processing. While
occasionally intended, it may often be possible to reformulate the
query that avoids the use of this cross product, perhaps by adding a
relationship between the different parts or by using OPTIONAL MATCH
You have two issues going on with your query - first, you're assigning p1
and p2
independent of one another, possibly creating this cartesian product. The second issue is that because all of your links in your graph go both ways and you're asking for an undirected connection you're making the DB work twice as hard, because it could actually traverse what you're asking for either way. To make matters worse, because all of the links go both ways, you have many cycles in your graph, so as cypher explores the paths that it can take, many paths it will try will loop back around to where it started. This means that the query engine will spend a lot of time chasing its own tail.
You can probably immediately improve the query by doing this:
MATCH p=shortestPath((p1:SearchableNode {name:"Ishaan"})-[:NAVIGATE_TO*]->(p2:SearchableNode {name:"Garima"}))
RETURN p;
Two modifications here - p1 and p2 are bound to each other immediately, you don't separately match them. Second, notice the [:NAVIGATE_TO*]->
part, with that last arrow ->
; we're matching the relationship ONE WAY ONLY. Since you have so many reflexive links in your graph, either way would work fine, but either way you choose you cut the work the DB has to do in half. :)
This may still perform not so great, because traversing that graph is still going to have a lot of cycles, which will send the DB chasing its tail trying to find the best path. In your modeling choice here, you usually shouldn't have relationships going both ways unless you need separate properties on each relationship. A relationship can be traversed in both directions, so it doesn't make sense to have two (one in each direction) unless the information that relationship is capturing is semantically different.
Often you'll find with query performance that you can do better by reformulating the query and thinking about it, but there's major interplay between graph modeling and overall performance. With the graph set up with so many bi-directional links, there will only be so much you can do to optimize path-finding.