Big data analytics with Neo4j and Java, Part 1

Graph databases like Neo4j are ideal for modeling complex relationships--and they move through big data at lightspeed

1 2 Page 2
Page 2 of 2

The relationshipVariable is optional, but it's required if you want to be able to access it in your RETURN statement (or in a WHERE clause). The arrows, ()-[]->(), denote the direction of the relationship, which is required by Cypher. If you wanted to express that Linda is married to Steven, then you could write the relationship in the other direction as follows: ()<-[]-(). If you wanted to create a bi-directional relationship, showing that Linda and Steve are married to each other, then you would need to create two separate relationships. While Cypher requires that you define a direction to your relationship, you can query either with a direction or without a direction.

The following query finds all the people in this family who are married (note the lack of any direction in the query):

MATCH (p1:Person)-[:IS_MARRIED_TO]-(p2:Person) RETURN p1, p2

The result is shown in Figure 6.

osjp neo4j fig06 Steven Haines

Figure 6. Results from querying the IS_MARRIED_TO relationship

Now let's create a few more relationships:

MATCH (michael:Person {name: "Michael"}), (rebecca:Person {name: "Rebecca"}) CREATE (michael)-[:IS_SIBLILNG]->(rebecca) return michael, rebecca
MATCH (steven:Person {name: "Steven"}), (michael:Person {name: "Michael"}) CREATE (steven)-[:HAS_CHILD]->(michael) return steven, michael
MATCH (steven:Person {name: "Steven"}), (rebecca:Person {name: "Rebecca"}) CREATE (steven)-[:HAS_CHILD]->(rebecca) return steven, rebecca
MATCH (linda:Person {name: "Linda"}), (michael:Person {name: "Michael"}) CREATE (linda)-[:HAS_CHILD]->(michael) return linda, michael
MATCH (linda:Person {name: "Linda"}), (rebecca:Person {name: "Rebecca"}) CREATE (linda)-[:HAS_CHILD]->(rebecca) return linda, rebecca

We can now see all people and their relationships with the following query:

MATCH (p:Person) RETURN p

The result is shown in Figure 7.

osjp neo4j fig07 Steven Haines

Figure 7. Results of querying all nodes with a Person label and their relationships

Traversing the social graph

To really explore the power of graph databases, we'll need to expand our social graph. To start, let's add some FRIEND relationships:

        MATCH (michael:Person {name: "Michael"}) CREATE (michael)-[:FRIEND]->(charlie:Person {name: "Charlie", age: 16}) RETURN michael, charlie
        MATCH (michael:Person {name: "Michael"}) CREATE (michael)-[:FRIEND]->(koby:Person {name: "Koby"}) RETURN michael, koby
        MATCH (michael:Person {name: "Michael"}) CREATE (michael)-[:FRIEND]->(grant:Person {name: "Grant"}) RETURN michael, grant
        MATCH (rebecca:Person {name: "Rebecca"}) CREATE (rebecca)-[:FRIEND]->(jordyn:Person {name: "Jordyn"}) RETURN rebecca, jordyn
        MATCH (rebecca:Person {name: "Rebecca"}) CREATE (rebecca)-[:FRIEND]->(katie:Person {name: "Katie"}) RETURN rebecca, katie

Something interesting about these relationships is that the friend nodes are created at the same time as the FRIEND relationships. For example, the "Charlie" Person node does not exist when the first statement is executed, but the statement creates a FRIEND relationship from the existing "Michael" Person node to a new Person node with the name "Charlie". You can pull up all Person nodes and verify that the node was created as shown in Figure 8.

osjp neo4j fig08 Steven Haines

Figure 8. Result of querying all nodes with a Person label and their relationships (with friends added)

We have a pretty good social graph started, so let's try writing a more involved query to find all the friends of my children:

MATCH (steven:Person {name:"Steven"})-[:HAS_CHILD]-(:Person)-[:FRIEND]-(friend:Person) RETURN friend

The results are shown in Figure 9.

osjp neo4j fig09 Steven Haines

Figure 9. Friends of all my children

In this query, we start with the Person node with the name "Steven", traverse across all HAS_CHILD relationships to Person nodes, traverse across all of those Person nodes' FRIEND relationships, and return the list of friends. We could have included directional relationships, but omitting the arrowhead allows us to traverse both directions.

Key/value pairs in the social graph

In addition to defining a relationship between two nodes, relationships themselves can have key/value pairs. For example, we might decide to create Movie nodes, then create HAS_SEEN relationships between people and movies they have seen. In those HAS_SEEN relationships we could also add a "rating" property. The following code creates a Movie with the title Avengers and then creates a HAS_SEEN relationship between Michael and the movie Avengers, with a rating of 5.

        CREATE (movie:Movie {title:"Avengers"}) RETURN movie
        MATCH (michael:Person {name:"Michael"}), (avengers:Movie {title:"Avengers"}) CREATE (michael)-[:HAS_SEEN {rating:5}]->(avengers) return michael, avengers

Figure 10 shows the results.

osjp neo4j fig10 Steven Haines

Figure 10. Creating a relationship with a rating property

Graph analytics in Java

For our final example before getting into Java code, let's try a simple experiment with graph analytics. We'll add a few movies to my children's friends, set the gender of my children, and then query for movies that one of my children (Michael) might like to see. The results are shown in Figure 11.

        CREATE (movie:Movie {title:"Batman"}) RETURN movie
        CREATE (movie:Movie {title:"Gone with the Wind"}) RETURN movie
        CREATE (movie:Movie {title:"Spongebob Square Pants"}) RETURN movie
        CREATE (movie:Movie {title:"Avengers 2"}) RETURN movie
        MATCH (charlie:Person {name:"Charlie"}), (movie:Movie {title:"Batman"}) CREATE (charlie)-[:HAS_SEEN {rating:4}]->(movie) return charlie, movie
        MATCH (charlie:Person {name:"Charlie"}), (movie:Movie {title:"Gone with the Wind"}) CREATE (charlie)-[:HAS_SEEN {rating:0}]->(movie) return charlie, movie
        MATCH (koby:Person {name:"Koby"}), (movie:Movie {title:"Batman"}) CREATE (koby)-[:HAS_SEEN {rating:4}]->(movie) return koby, movie
        MATCH (koby:Person {name:"Koby"}), (movie:Movie {title:"Avengers 2"}) CREATE (koby)-[:HAS_SEEN {rating:5}]->(movie) return koby, movie
        MATCH (grant:Person {name:"Grant"}), (movie:Movie {title:"Spongebob Square Pants"}) CREATE (grant)-[:HAS_SEEN {rating:1}]->(movie) return grant, movie
        MATCH (jordyn:Person {name:"Jordyn"}), (movie:Movie {title:"Spongebob Square Pants"}) CREATE (jordyn)-[:HAS_SEEN {rating:5}]->(movie) return jordyn, movie
        MATCH (michael:Person {name: "Michael"}) SET michael.gender = "male" RETURN michael
        MATCH (rebecca:Person {name: "Rebecca"}) SET rebecca.gender = "female" RETURN rebecca
        MATCH (steven:Person {name:"Steven"})-[:HAS_CHILD]-(child:Person)-[:FRIEND]-(friend:Person)-[hasSeen:HAS_SEEN]-(movie:Movie) WHERE child.gender = "male" AND hasSeen.rating > 3 RETURN DISTINCT movie.title
osjp neo4j fig11 Steven Haines

Figure 11. Results of querying all movies my children's friends have seen and rated greater than 3

The first four statements above create four movies. The next six statements create HAS_SEEN relationships between friends of my children and the movies they've seen, with different ratings. The next two statements add a gender to my children, which is accomplished by finding the Person node by name and then calling SET childName.gender = "male|female". in Cypher, the SET statement allows you to change an existing property, add a new property, or delete a property by setting the value to NULL.

The final query takes a little work to understand. We start with the Person with the name "Steven", follow his HAS_CHILD relationships to children Person nodes, follow those Person nodes to FRIEND Person nodes, follow those friend Person nodes to Movie nodes through HAS_SEEN relationships, and then adds a WHERE clause that checks both the gender of Steven's child and the value of the HAS_SEEN rating property.

Finally, because some children have seen the same movie (Batman), we want to only return DISTINCT movie titles. In this case we do not return the movie node, but rather the movie's title property, which is why the output is presented in a table. For the clever observer, we could simplify this a little by adding the gender to the child node query, as follows:

MATCH (steven:Person {name:"Steven"})-[:HAS_CHILD]-(child:Person {gender:"male"})-[:FRIEND]-(friend:Person)-[hasSeen:HAS_SEEN]<-(movie:Movie) WHERE hasSeen.rating > 3 RETURN DISTINCT movie.title

Conclusion to Part 1

Cypher is a different way of thinking about writing queries and I encourage you to read through the formal documentation to learn more. Once you have a handle on writing Cypher queries, the Java programming will be the easy part! We'll pick that up in the second half of this introduction to graphing data and relationships with Neo4j.

1 2 Page 2
Page 2 of 2