The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.
Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.
The project is called Lightbulb: https://github.com/espeed/lightbulb
The README describes what it does...
What is Lightbulb?
Lightbulb is a Git-powered, Neo4j-backed blog engine for Heroku
written in Python.
You get to write blog entries in Emacs (or your favorite text editor)
and use Git for version control, without giving up the features of a
dynamic app.
Write blog entries in ReStructuredText, and style them using your
website's templating system.
When you push to Heroku, the entry metadata will be automatically
saved to Neo4j, and the HTML fragment generated from the
ReStructuredText source file will be served off disk.
However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.
Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.
But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:
https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py
https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy
model.py
provides an example for building custom Bulbs models and a custom Bulbs Graph
class.
gremlin.groovy
contains a custom Gremlin script that the custom Entry
model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.
Notice in the model.py
file above, I customize EntryProxy
by overriding the create()
and update()
methods and instead define a singular save()
method to handle creates and updates.
To hook the custom EntryProxy
into the Entry
model, I simply override the Entry
model's get_proxy_class
method so that it returns the EntryProxy
class instead of the default NodeProxy
class.
Everything else in the Entry
model is designed around building up the data for the save_blog_entry
Gremlin script (defined in the gremlin.groovy file above).
Notice in gremlin.groovy that the save_blog_entry()
method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.
By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.
You can see how the entire script is executed in the final line of the Gremlin method:
return transaction(save_blog_entry);
Here I'm simply wrapping a transaction closure around all the commands in internal save_blog_entry
closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.
Then if you look at the code in the internal save_blog_entry
closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in the Entry
model:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
The params I pass in are built up in the model's custom _get_parms()
method:
def _get_params(self, _data, kwds):
params = dict()
# Get the property data, regardless of how it was entered
data = build_data(_data, kwds)
# Author
author = data.pop('author')
params['author_id'] = cache.get("username:%s" % author)
# Topic Tags
tags = (tag.strip() for tag in data.pop('tags').split(','))
topic_bundles = []
for topic_name in tags:
#slug = slugify(topic_name)
bundle = Topic(self._client).get_bundle(name=topic_name)
topic_bundles.append(bundle)
params['topic_bundles'] = topic_bundles
# Entry
# clean off any extra kwds that aren't defined as an Entry Property
desired_keys = self.get_property_keys()
data = extract(desired_keys, data)
params['entry_bundle'] = self.get_bundle(data)
return params
Here's what's _get_params()
is doing...
buld_data(_data, kwds)
is a function defined in bulbs.element
:
https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959
It simply merges the args in case the user entered some as positional args and some as keyword args.
The first param I pass into _get_params()
is author
, which is the author's username, but I don't pass the username to the Gremlin script, I pass the author_id
. The author_id
is cached so I use the username to look up the author_id
and set that as a param, which I will later pass to the Gremlin save_blog_entry
script.
Then I create Topic
Model
objects for each blog tag that was set, and I call get_bundle()
on each and save them as a list of topic_bundles
in params.
The get_bundle()
method is defined in bulbs.model:
https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363
It simply returns a tuple containing the data
, index_name
, and index keys
for the model instance:
def get_bundle(self, _data=None, **kwds):
"""
Returns a tuple containing the property data, index name, and index keys.
:param _data: Data that was passed in via a dict.
:type _data: dict
:param kwds: Data that was passed in via name/value pairs.
:type kwds: dict
:rtype: tuple
"""
self._set_property_defaults()
self._set_keyword_attributes(_data, kwds)
data = self._get_property_data()
index_name = self.get_index_name(self._client.config)
keys = self.get_index_keys()
return data, index_name, keys
I added the get_bundle()
method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.
Finally, for Entry
, I simply create an entry_bundle
and store it as the param.
Notice that _get_params()
returns a dict
of three params: author_id
, topic_bundle
, and entry_bundle
.
This params
dict
is passed directly to the Gremlin script:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
self._initialize(result)
And the Gremlin script has the same arg names as those passed in by params
:
def save_blog_entry(entry_bundle, author_id, topic_bundles) {
// Gremlin code omitted for brevity
}
The params are then simply used in the Gremlin script as needed -- nothing special going on.
So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:
class Graph(Neo4jGraph):
def __init__(self, config=None):
super(Graph, self).__init__(config)
# Node Proxies
self.people = self.build_proxy(Person)
self.entries = self.build_proxy(Entry)
self.topics = self.build_proxy(Topic)
# Relationship Proxies
self.tagged = self.build_proxy(Tagged)
self.author = self.build_proxy(Author)
# Add our custom Gremlin-Groovy scripts
scripts_file = get_file_path(__file__, "gremlin.groovy")
self.scripts.update(scripts_file)
You can now import Graph
directly from your app's model.py
and instantiate the Graph
object like normal.
>> from lightbulb.model import Graph
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data) # execute transaction via Gremlin script
Does that help?