What is Cypher?

Cypher is a graph query language that is used to query the Neo4j Database. Just like you use SQL to query a MySQL database, you would use Cypher to query the Neo4j Database.

A simple Cypher query can look something like this

Details

This content is only revealed when the user clicks the block title.

MATCH (m:Movie)
WHERE m.released > 2000
RETURN m LIMIT 5
Hint: You can click on the query above to populate it in the editor.

Expected Result: The above query will return all the movies that were released after the year 2000 limiting the result to 5 items.

5 movies

Try

  1. Write a query to retrieve all the movies released after the year 2005.

    Details
    MATCH (m:Movie)
    WHERE m.released > 2005
    RETURN m
  2. Write a query to return the count of movies released after the year 2005. (Hint: you can use the COUNT(m) function to return the count)

    Details
    MATCH (m:Movie)
    WHERE m.released > 2005
    RETURN count(m)

Nodes and Relationships

Nodes and Relationships are the basic building blocks of a graph database.

Nodes

Nodes represent entities. A node in graph database is similar to a row in a relational database. In the picture below we can see 2 kinds of nodes - Person and Movie. In writing a Cypher query, a node is enclosed between a

parenthesis — like (p:Person) where p is a variable and Person is the type of node it is referring to.

schema

Relationship

Two nodes can be connected with a relationship. In the above image ACTED_IN, REVIEWED, PRODUCED, WROTE and DIRECTED are all relationships connecting the corresponding types of nodes.

In writing a cypher query, relationships are enclosed in square brackets - like [w:WORKS_FOR] where w is a variable and WORKS_FOR is the type of relationship it is referring to.

Two nodes can be connected with more than one relationships.

MATCH (p:Person)-[d:DIRECTED]-(m:Movie)
WHERE m.released > 2010
RETURN p,d,m
Hint: You can click on the query above to populate it in the editor.

Expected Result: The above query will return all Person nodes who directed a movie that was released after 2010.

movies after 2010

Try

  1. Query to get all the people who acted in a movie that was released after 2010.

    Details
    MATCH (p:Person)-[r:ACTED_IN]-(m:Movie)
    WHERE m.released > 2010
    RETURN p,r,m

Labels

Labels is a name or identifier of a Node or a Relationship. In the image below Movie and Person are Node labels and ACTED_IN, REVIEWED, etc are Relationship types.

schema

In writing a Cypher query, Labels are prefixed with a colon - like :Person or :ACTED_IN. You can assign the node label to a variable by prefixing the syntax with the variable name. Like (p:Person) means p variable denoted Person labeled nodes.

Labels are used when you want to perform operations only on a specific types of Nodes. Like

MATCH (p:Person)
RETURN p
LIMIT 20

will return only Person Nodes (limiting to 20 items) while

MATCH (n)
RETURN n
LIMIT 20

will return all kinds of nodes (limiting to 20 items).

Properties

Properties are name-value pairs that are used to add attributes to nodes and relationships.

To return specific properties of a node you can write

MATCH (m:Movie)
RETURN m.title, m.released
movies properties

Expected Result - This will return Movie nodes but with only the title and released properties.

Try

  1. Write a query to get name and born properties of the Person node.

    Details
    MATCH (p:Person)
    RETURN p.name, p.born

Create a Node

CREATE clause can be used to create a new node or a relationship.

CREATE (p:Person {name: 'John Doe'})
RETURN p

The above statement will create a new Person node with property name having value John Doe.

Try

  1. Create a new Person node with a property name having the value of your name.

    Details
    CREATE (p:Person {name: '<Your Name>'})
    RETURN p

Finding Nodes with Match and Where Clause

Match clause is used to find nodes that match a particular pattern. This is the primary way of getting data from a Neo4j database.

In most cases, a Match is used along with certain conditions to narrow down the result.

MATCH (p:Person {name: 'Tom Hanks'})
RETURN p

This is one way of doing it. Although you can only do basic string match based filtering this way (without using WHERE clause).

Another way would be to use a WHERE clause which allows for more complex filtering including >, <, STARTS WITH, ENDS WITH, etc

MATCH (p:Person)
WHERE p.name = "Tom Hanks"
RETURN p

Both of the above queries will return the same results.

You can read more about Where clause and list of all filters here - https://neo4j.com/docs/cypher-manual/current/clauses/where/

Try

  1. Find the movie with title "Cloud Atlas"

    Details
    MATCH (m:Movie {title: "Cloud Atlas"})
    RETURN m
  2. Get all the movies that were released between 2010 and 2015.

    Details
    MATCH (m:Movie)
    WHERE m.released > 2010 AND m.released < 2015
    RETURN m

Merge Clause

The MERGE clause is used to either

  1. match the existing nodes and bind them or

  2. create new node(s) and bind them

It is a combination of MATCH and CREATE and additionally allows to specify additional actions if the data was matched or created.

MERGE (p:Person {name: 'John Doe'})
ON CREATE SET p.createdAt = timestamp()
ON MATCH SET p.lastLoggedInAt = timestamp()
RETURN p

The above statement will create the Person node if it does not exist. If the node already exists, then it will set the property lastLoggedInAt to the current timestamp. If the node did not exist and was newly created instead, then it will set the createdAt property to the current timestamp.

Try

  1. Write a query using Merge to create a movie node with title "Greyhound". If the node does not exist then set its released property to 2020 and lastUpdatedAt property to the current time stamp. If the node already exists, then only set lastUpdatedAt to the current time stamp. Return the movie node.

    Details
    MERGE (m:Movie {title: 'Greyhound'})
    ON CREATE SET m.released = "2020", m.lastUpdatedAt = timestamp()
    ON MATCH SET m.lastUpdatedAt = timestamp()
    RETURN m

Create a Relationship

A Relationship connects 2 nodes.

MATCH (p:Person), (m:Movie)
WHERE p.name = "Tom Hanks" AND m.title = "Cloud Atlas"

CREATE (p)-[w:WATCHED]->(m)
RETURN type(w)

The above statement will create a relationship :WATCHED between the existing Person and Movie nodes and return the type of relationship (i.e WATCHED).

Try

  1. Create a relationship :WATCHED between the node you created for yourself previously in step 6 and the movie Cloud Atlas and then return the type of created relationship

    Details
    MATCH (p:Person), (m:Movie)
    WHERE p.name = "<Your Name>" AND m.title = "Cloud Atlas"
    
    CREATE (p)-[w:WATCHED]->(m)
    RETURN type(w)

Relationship Types

In Neo4j, there can be 2 kinds of relationships - incoming and outgoing.

relationship types

In the above picture, the Tom Hanks node is said to have an outgoing relationship while the Cloud Atlas node is said to have an incoming relationship.

Relationships always have a direction. However, you only have to pay attention to the direction where it is useful.

To denote an outgoing or an incoming relationship in cypher, we use or .

Example -

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p,r,m

In the above query, Person has an outgoing relationship and movie has an incoming relationship.

Although, in the case of the movies dataset, the direction of the relationship is not that important and even without denoting the direction in the query, it will return the same result. So the query

MATCH (p:Person)-[r:ACTED_IN]-(m:Movie)
RETURN p,r,m

will return the same reuslt as the above one.

Try

  1. Write a query to find the nodes Person and Movie which are connected by REVIEWED relationship and is outgoing from the Person node and incoming to the Movie node.

    Details
    MATCH (p:Person)-[r:REVIEWED]-(m:Movie)
    RETURN p,r,m

Advanced Cypher queries

Let’s look at some questions that you can answer with Cypher queries.

  1. Finding who directed Cloud Atlas movie

    MATCH (m:Movie {title: 'Cloud Atlas'})<-[d:DIRECTED]-(p:Person)
    RETURN p.name
  2. Finding all people who have co-acted with Tom Hanks in any movie

    MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(p:Person)
    RETURN p.name
  3. Finding all people related to the movie Cloud Atlas in any way

    MATCH (p:Person)-[relatedTo]-(m:Movie {title: "Cloud Atlas"})
    RETURN p.name, type(relatedTo)

    In the above query, we only used the variable relatedTo which will try to find all the relationships between any Person node and the movie node "Cloud Atlas"

  4. Finding Movies and Actors that are 3 hops away from Kevin Bacon.

    MATCH (p:Person {name: 'Kevin Bacon'})-[*1..3]-(hollywood)
    RETURN DISTINCT p, hollywood

Note: in the above query, hollywood refers to any node in the database (in this case Person and Movie nodes)

Great Job!

Now you know the basics of writing Cypher queries. You are on your way to becoming a graphista! Congratulations.

Feel free to play around with the data by writing more Cypher queries. If you want to learn more about Cypher,

you can use one of the below resources

  1. Cypher Manual - detailed manual on Cypher syntax

  2. Online Training - Introduction to Neo4j - If you are new to Neo4j and like to learn through an online class, this is the best place to get started.