Neo4j Basics: Getting Started with Graph Databases

Introduction

Traditional databases excel at structured data with well-defined schemas, but they struggle with the complex, interconnected data that defines modern applications. Social networks, recommendation systems, fraud detection, and knowledge graphs all involve intricate relationships that are cumbersome to model in relational tables.

Neo4j, the world’s leading graph database, solves this problem by making relationships first-class citizens. In Neo4j, data is represented as nodes (entities) and relationships (connections between entities), with both capable of holding properties. This natural representation mirrors how we think about connected data, resulting in more intuitive models and powerful queries.

In this article, we explore Neo4j fundamentals: the property graph model, the Cypher query language, data modeling best practices, and practical examples that will get you started with graph databases.

The Property Graph Model

Neo4j uses the property graph model, which consists of three core elements: nodes, relationships, and properties.

Nodes

Nodes represent entities in your domain. They are similar to rows in a relational database table, but without a fixed schema:

CREATE (p:Person {name: 'Alice', age: 30})

This creates a node with:

Label: Person
Properties: {name: 'Alice', age: 30}

A node can have multiple labels, allowing for flexible modeling:

CREATE (p:Person:Employee:Developer {name: 'Bob', salary: 120000})

Relationships

Relationships connect nodes and define how they relate to each other. Every relationship has a type and a direction:

CREATE (alice:Person {name: 'Alice'})-[:KNOWS {since: 2020}]->(bob:Person {name: 'Bob'})

This creates a KNOWS relationship from Alice to Bob with a since property. Key characteristics:

Relationships always have a direction (but can be queried in either direction)
Relationships must have a type
Relationships can have properties
Relationships cannot exist without connecting two nodes

Properties

Both nodes and relationships can have properties, which are key-value pairs:

// Node with properties
CREATE (movie:Movie {
    title: 'Inception',
    year: 2010,
    rating: 8.8,
    genres: ['Sci-Fi', 'Action']
})

// Relationship with properties
CREATE (person)-[:ACTED_IN {
    role: 'Cobb',
    salary: 1000000
}]->(movie)

Properties support various data types: strings, numbers, booleans, arrays, and spatial points.

Installing Neo4j

Neo4j offers multiple installation options to suit different use cases.

Neo4j Desktop (Recommended for Development)

Neo4j Desktop provides a GUI for managing local Neo4j instances:

# Download from https://neo4j.com/download/
# Install and create a new database
# Default credentials: neo4j / neo4j (will prompt for password change)

Docker Installation

For containerized environments:

docker run \
    --name neo4j \
    -p 7474:7474 \
    -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    -v neo4j_data:/data \
    neo4j:latest

Access the browser at http://localhost:7474 and connect via Bolt protocol at bolt://localhost:7687.

Installation on Linux

For production or server deployments:

# Add Neo4j repository
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee /etc/apt/sources.list.d/neo4j.list
sudo apt-get update

# Install Neo4j Community Edition
sudo apt-get install neo4j

# Start Neo4j
sudo systemctl start neo4j
sudo systemctl enable neo4j

Cypher Query Language

Cypher is Neo4j’s declarative query language, designed to be expressive and readable. It uses ASCII-art-like syntax for patterns.

MATCH - Finding Patterns

The MATCH clause finds patterns in the graph:

// Find all Person nodes
MATCH (p:Person)
RETURN p

// Find nodes with specific properties
MATCH (p:Person {name: 'Alice'})
RETURN p

// Find relationships
MATCH (alice)-[r:KNOWS]->(bob)
RETURN r

// Variable length patterns
MATCH (person)-[:KNOWS*2..3]-(friend)
RETURN DISTINCT friend

CREATE - Creating Data

Create nodes and relationships:

// Create a simple node
CREATE (p:Person {name: 'Charlie'})

// Create nodes and relationships in one statement
CREATE (alice:Person {name: 'Alice'})-[:KNOWS]->(bob:Person {name: 'Bob'})

// Create multiple relationships
CREATE (alice)-[:KNOWS]->(bob),
       (bob)-[:KNOWS]->(charlie),
       (charlie)-[:KNOWS]->(alice)

MERGE - Upsert Operations

MERGE creates nodes/relationships if they don’t exist, or matches existing ones:

// Create or match a node
MERGE (p:Person {name: 'Alice'})
RETURN p

// Create relationship if it doesn't exist
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
MERGE (a)-[r:KNOWS]->(b)
RETURN r

UPDATE - Modifying Data

// Update node properties
MATCH (p:Person {name: 'Alice'})
SET p.age = 31
RETURN p

// Add new properties
MATCH (p:Person {name: 'Alice'})
SET p:Student
RETURN p

// Remove properties
MATCH (p:Person {name: 'Alice'})
REMOVE p.age
RETURN p

DELETE - Removing Data

// Delete a relationship
MATCH (a)-[r:KNOWS]->(b)
DELETE r

// Delete a node (and its relationships)
MATCH (p:Person {name: 'Bob'})
DETACH DELETE p

// Delete all nodes and relationships
MATCH (n)
DETACH DELETE n

Querying with Filters

WHERE clauses filter results:

// Filter by property
MATCH (p:Person)
WHERE p.age > 25
RETURN p.name, p.age

// Filter by relationship property
MATCH (a)-[r:KNOWS]->(b)
WHERE r.since < 2021
RETURN a.name, b.name, r.since

// IN operator
MATCH (p:Person)
WHERE p.name IN ['Alice', 'Bob', 'Charlie']
RETURN p

// EXISTS
MATCH (p:Person)
WHERE EXISTS(p.email)
RETURN p.name, p.email

Aggregation and Ordering

// Count nodes
MATCH (p:Person)
RETURN count(p)

// Group and aggregate
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN p.name, count(friend) AS friendCount
ORDER BY friendCount DESC

// Collect results
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN p.name, collect(friend.name) AS friends

Practical Examples

Let’s build a more complete example—a social network:

// Create users
CREATE 
  (alice:Person {name: 'Alice', age: 30}),
  (bob:Person {name: 'Bob', age: 28}),
  (charlie:Person {name: 'Charlie', age: 35}),
  (diana:Person {name: 'Diana', age: 27})

// Create friendships
CREATE 
  (alice)-[:KNOWS {since: 2020}]->(bob),
  (bob)-[:KNOWS {since: 2019}]->(charlie),
  (charlie)-[:KNOWS {since: 2021}]->(diana),
  (diana)-[:KNOWS {since: 2020}]->(alice),
  (alice)-[:KNOWS {since: 2018}]->(charlie)

// Query: Find Alice's friends of friends
MATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend)-[:KNOWS]->(friendOfFriend)
WHERE NOT (alice)-[:KNOWS]->(friendOfFriend)
RETURN DISTINCT friendOfFriend.name

// Query: Find the most connected person
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN p.name, count(friend) AS connections
ORDER BY connections DESC
LIMIT 1

Indexes and Constraints

Improve query performance with indexes:

// Create index on property
CREATE INDEX person_name IF NOT EXISTS FOR (p:Person) ON (p.name)

// Create composite index
CREATE INDEX person_age_name IF NOT EXISTS FOR (p:Person) ON (p.age, p.name)

// Create uniqueness constraint
CREATE CONSTRAINT person_unique_name IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE

// Create node key
CREATE NODE KEY person_ssn IF NOT EXISTS FOR (p:Person) REQUIRE p.ssn

Data Import

Import data from various sources:

CSV Import with LOAD CSV

// Import from CSV
LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row
CREATE (p:Person {
    name: row.name,
    age: toInteger(row.age),
    email: row.email
})

JSON Import

// Import JSON data
CALL apoc.load.json('https://api.example.com/data.json') YIELD value
CREATE (n:Person {name: value.name, age: value.age})

Graph Modeling Best Practices

Follow these principles for effective graph models:

Use nouns for nodes, verbs for relationships: Person node with KNOWS relationship
Name relationships consistently: Use singular, uppercase types like ACTED_IN, WORKED_AT
Model facts as relationships: Employment, ownership, transactions
Model attributes as properties: Properties that don’t define identity
Use labels for grouping: Person:Employee:Manager for type hierarchies

Conclusion

Neo4j provides a powerful way to model and query connected data. The property graph model naturally represents real-world relationships, and Cypher makes querying intuitive with its pattern-matching syntax. From social networks to fraud detection, Neo4j handles use cases where relationships matter most.

In the next article, we’ll explore Neo4j operations—installation in production environments, configuration tuning, backup strategies, and monitoring.