install gremlin on neo4jInstall Gremlin for Neo4j 2.2

At the heart of the Exakat engine, we use Gremlin and Neo4j. Neo4j is the graph database, and Gremlin is the graph traversal language. Until today (2015, August 20th), Neo4j 2.2 had no working plug-in for Gremlin, which made us use the latest neo4j 2.1.8, but miss all the effort going on the 2.2 (and the upcoming 2.3). Thanks to @spmallette and @nachivpn, this is now working fine.

Here is the way to install gremlin on Neo4j 2.2 and have it running.

We’ll focus on installing on OSX, though it should be easy to adapt it for other linux-style systems.

Install Java

On OSX, just make sure you have installed Oracle’s Java, and not Apple’s. I used Java 8, and it should work on Java 7 too, though it is not recommended. Get the JRE, and don’t mistake your java installation with the one of your browsers. http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

Once installed, or check the $JAVA_HOME environnement variable : it should be set to the right version you plan to use. Otherwise, export it to the good version. Do the following in the Terminal :

echo $JAVA_HOME

// must look like this :
/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home
// jdk1.8.0_60.jdk is the main selector for the good version

Validate with

java -version

Install Neo4j 2.2.4

First, we need Neo4j. I use community version. Download Neo4j 2.2.4 (as of August 20th), from http://neo4j.com/download/other-releases/. Then

tar -xvf neo4j-community-2.2.4-unix.tar.gz
cd neo4-community-2.24
./bin/neo4j start // to check this is working fine
./bin/neo4j stop // to avoid interference with gremlin install
// set this environnement variable to the right path

export NEO4J_HOME=`pwd`

Install Gremlin for Neo4j

The current plug-in for Gremlin is available on github, and maintained by the fine team at Think Aurelius : https://github.com/thinkaurelius/neo4j-gremlin-plugin. Don’t mistake it with the neo4j-contrib, which is the father project, but isn’t updated anymore.

git clone https://github.com/thinkaurelius/neo4j-gremlin-plugin.git gremlin
cd gremlin

At that point, there are two options :

  • Option a) Install Gremlin-plugin with sonatype repository (the easy path)
  • Option b) Install Gremlin-plugin from repositories (the longer path)

Option a) Install Gremlin-plugin with sonatype repository

This is the confortable solution, suggested by @spmalette. Edit pom.xml, and after the <contributor> tag, add the following

<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases><enabled>false</enabled></releases>
<snapshots><enabled>true</enabled></snapshots>
</repository>
</repositories>

Save. Then, run in the Terminal :

mvn clean package

unzip target/neo4j-gremlin-plugin-*-server-plugin.zip -d $NEO4J_HOME/plugins/gremlin-plugin
$NEO4J_HOME/bin/neo4j restart

This should be good. Jump to the finalization.

Option b) Install Gremlin-plugin from repositories

At the moment, the gremlin plug-in depends on Gremlin-2.7.0-SNAPSHOT, which is not available on maven servers. So, you have to compile it yourself. Gremlin depends on three other projects, so you need to compile all them. The four following repositories are :

For each of the repo, do :

git clone <repo>
cd <repo>
mvn install
cd ..
rm -rf <repo>

This will install the needed version for the plug-in on your system. Then, you can go to the gremlin-plugin folder :

cd gremlin-plugin folder
mvn clean package

unzip target/neo4j-gremlin-plugin-*-server-plugin.zip -d $NEO4J_HOME/plugins/gremlin-plugin
$NEO4J_HOME/bin/neo4j restart

Now, you may move on to the finalization

Finalization

Finally, we need to activate the gremlin plug-in within Neo4j. Neo4j used to ship the gremlin plug-in and have it activated by default (up to the 1.9.* versions), but this is not the case anymore.

In the Neo4j’s fold, edit conf/neo4j-server.properties

Around line 59, add the following configuration :

org.neo4j.server.thirdparty_jaxrs_classes=com.thinkaurelius.neo4j.plugins=/tp

Then, restart Neo4j.

./bin/neo4j restart

The simplest way to check if all is running well is to use this script, from command line :

curl -s -G --data-urlencode 'script="Hello World!"' \
http://localhost:7474/tp/gremlin/execute

This should return the following :

{
"results": [
"Hello World!"
],
"success": true
}

Two main differences between the neo-contrib gremlin plug-in and thinkaurelius’s

First, the Gremlin URL is now /tp and not /db/data/ext/GremlinPlugin/graphdb/execute_script as it used to be. That may block an old library (neo4jPHP for example) that has hardcoded the path to the Gremlin plug-in.

Secondly, thinkaurelius’s plug-in uses the GET method and not POST, when communicating with the REST API. The general way of sending the gremlin query to the server is the same (params=value), so short queries will work the same.

The GET method has its own limitations (that, or its implementations). Somewhere when the query pass 20kb, it gets too long and an error is emitted. This is not the case with POST method, and if you’re sending data to feed Neo4j, that may happen quite fast.

To go around this, you may now create gremlin scripts, that resides in the ‘scripts’ folder at the root of Neo4j (You’ll need to create this folder yourself). This folder will contains .gremlin files that contains valid Gremlin code. Here is a simple example, extracted from the docs :

def sayHello(def name) {
"Hello ${name}!"
}

Once the code is written in the script, you may load them with the ‘load’ variable in the GET query, containing the file name (not the function’s name, unless they are the same). Loading is needed only once for the server lifetime, and if you don’t need this again, make a restart to clean the memory. I’m not aware of limitations for those scripts, though I may have to meet them some day.

This scripting system is going to be a good system for exakat, as we need to define some gremlin steps that are often reused. They could be easily be loaded in the server at startup time, and be ready for use later.