Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
239 views
in Technique[技术] by (71.8m points)

azure cosmosdb gremlinapi - How to Perform Once off Side Effect at the Beginning of a Repeat in Gremlin?

I need to run a Gremlin query which travels from the leaves (vertices with no outgoing edges) and edgeless vertices of a graph down the graph collecting the starting vertices as well as incoming vertices (1 level at a time) up to a certain limit. This limit must not be exceeded so if the next level of incomer vertices would cause the count to exceed the limit then we do not collect those vertices and return what we have. Here is what I have at the moment:

g.V().or(__.not(outE()),__.not(bothE())).limit(700)
.store('a')
.repeat(__.sideEffect(select('b').store('a')).in().as('b'))
.until(union(cap('a').unfold().count(),select('b').count()).sum().is(gt(700)))
.cap('a').unfold()

The problem is that the sideEffect step inside of the repeat step is executed once for every vertex in the stream. I want it to be executed only one time regardless of how many vertices are in the stream. How do I accomplish this?

question from:https://stackoverflow.com/questions/65935820/how-to-perform-once-off-side-effect-at-the-beginning-of-a-repeat-in-gremlin

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm not sure if this approach works on CosmosDB, but it may well be the only approach you can realistically take with Gremlin that doesn't involve multiple Gremlin requests and extra processing. I use this sample graph for demonstration:

g = TinkerGraph.open().traversal()
g.addV().property(id,'A').as('a').
           addV().property(id,'B').as('b').
           addV().property(id,'C').as('c').
           addV().property(id,'D').as('d').
           addV().property(id,'E').as('e').
           addV().property(id,'F').as('f').
           addE('next').from('a').to('b').
           addE('next').from('b').to('c').
           addE('next').from('b').to('d').
           addE('next').from('c').to('e').
           addE('next').from('d').to('e').
           addE('next').from('e').to('f').iterate()

The approach involves use of group() where you essentially form a new grouping of traversed objects each time you loop through the repeat():

gremlin> g.V('A').
......1>   group('m').by(constant(-1)).
......2>   repeat(out().group('m').by(loops())).
......3>   cap('m')
==>[-1:[v[A]],0:[v[B]],1:[v[C],v[D]],2:[v[E],v[E]],3:[v[F],v[F]]]

That gives you the structure of the data you want to process, now you just need to make sure you terminate the repeat() as early as possible:

gremlin> g.V('A').
......1>   group('m').by(constant(-1)).
......2>   until(cap('m').select(values).unfold().count(local).sum().is(gte(3))).
......3>     repeat(out().group('m').by(loops())).
......4>   cap('m')
==>[-1:[v[A]],0:[v[B]],1:[v[C],v[D]]]

In the above example, we look at "m" in the until() and do a count of all the vertices collected so far. When it exceeds our max ,in this case "3", we quit. When we quit, we can see that we may or may not have collected more than we needed. In this example we did, so we need to throw that away. You technically need all but the last grouping to satisfy your limit, but unfortunately "all but last" is not easy with Gremlin. I ended up with this approach which basically grabs the last item to throw away and then uses it as a filter against the result. Note that we get two results because traversing to that next level would exceed our limit of "3" results total:

gremlin> g.V('A').
......1>   group('m').by(constant(-1)).
......2>   until(cap('m').select(values).unfold().count(local).sum().is(gt(3))).
......3>     repeat(out().group('m').by(loops())).
......4>   cap('m').
......5>   select(values).as('v').
......6>   tail(local).as('e').
......7>   select('v').unfold().
......8>   where(P.neq('e')).
......9>   unfold()
==>v[A]
==>v[B]

Note that when we bump from a limit of "3" to "4" the result changes as traversing to the next level will add 2 more to the total but will not exceed 4 total.

gremlin> g.V('A').
......1>   group('m').by(constant(-1)).
......2>   until(cap('m').select(values).unfold().count(local).sum().is(gt(4))).
......3>     repeat(out().group('m').by(loops())).
......4>   cap('m').
......5>   select(values).as('v').
......6>   tail(local).as('e').
......7>   select('v').unfold().
......8>   where(P.neq('e')).
......9>   unfold()
==>v[A]
==>v[B]
==>v[C]
==>v[D]

This next example notes that we don't take duplicates into account with this as it's not clear from your use case what's expected (or if this will even work) but hopefully this provides enough structure for you to at least form the traversal you're looking for:

gremlin> g.V('A').
......1>   group('m').by(constant(-1)).
......2>   until(cap('m').select(values).unfold().count(local).sum().is(gt(5))).
......3>     repeat(out().group('m').by(loops())).
......4>   cap('m').
......5>   select(values).as('v').
......6>   tail(local).as('e').
......7>   select('v').unfold().
......8>   where(P.neq('e')).
......9>   unfold()
==>v[A]
==>v[B]
==>v[C]
==>v[D]

Thanks to Kelvin Lawrence for suggesting the general approach I've taken in this answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...