Introduction
Relational databases normalize data across tables and use joins at query time to reconstruct relationships. MongoDB, as a document database, embeds related data inside documents or references documents in other collections. Choosing the right relationship pattern determines query performance, write throughput, and scalability.
The three fundamental relationship types are:
- One-to-One (1-1): A customer has one profile. A user has one account settings document.
- One-to-Many (1-N): A customer has many invoices. A blog author has many posts.
- Many-to-Many (N-N): An invoice contains many products; each product appears in many invoices.
Beyond these, MongoDB introduces a special case called One-to-Zillions, where one entity relates to an unbounded number of related entities (think a sensor reading millions of data points). In relational databases, cardinality rarely exceeded practical join limits. In document databases, the maximum document size (16 MB) forces different design decisions at scale.
This article walks through each relationship type with concrete MongoDB shell examples, explains when to embed vs reference, and provides a decision framework you can apply to your own schemas.
Core Concept: Embedding vs Referencing
MongoDB offers two strategies for representing relationships:
Embedding places related data inside the parent document as a sub-document or array. This keeps all related data in one place, so a single read retrieves everything you need. Embedding works well when the related data is small, rarely changes independently, and you always query it together with the parent.
Referencing stores only the _id (or another key) of the related document in the parent. You resolve the reference with a second query or a $lookup stage in an aggregation pipeline. Referencing works well when related data grows unbounded, changes independently, or when you need to query it in isolation.
The 16 MB document size limit is the hard constraint: if embedding would push a document past that limit, you must reference. But practical limits are typically far lower — a document with a 10 MB array of embedded sub-documents still causes slow reads and costly network transfers.
One-to-One Relationships
A one-to-one relationship means each entity in collection A relates to exactly one entity in collection B, and vice versa. For example, each user has exactly one profile settings document.
Pattern 1: Embed as Sub-Document
The simplest approach stores the related fields directly inside the parent document:
db.users.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e1"),
username: "jdoe",
email: "[email protected]",
profile: {
displayName: "John Doe",
avatar: "/avatars/jdoe.png",
bio: "Software engineer and MongoDB enthusiast",
joined: ISODate("2025-06-01")
}
})
Every time you query the user, the profile comes along. This is the most efficient read pattern because it requires exactly one round trip.
Pattern 2: Embed Fields Directly
If the related data is just a handful of scalar fields, you can flatten them instead of nesting:
db.users.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e2"),
username: "asmith",
email: "[email protected]",
displayName: "Alice Smith",
avatar: "/avatars/asmith.png",
bio: "DevOps engineer",
joined: ISODate("2025-07-15")
})
Flattening avoids sub-document syntax in queries (db.users.find({ "profile.displayName": "John" }) vs db.users.find({ displayName: "John" })). Both are valid; choose based on whether the related fields form a logical group that you might want to project or update independently.
Pattern 3: Reference to Another Collection
When the related data changes independently or you query it separately, use a reference:
db.users.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e3"),
username: "bwilson",
email: "[email protected]",
profileId: ObjectId("65a1b2c3d4e5f6a7b8c9d0e4")
})
db.profiles.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e4"),
userId: ObjectId("65a1b2c3d4e5f6a7b8c9d0e3"),
displayName: "Bob Wilson",
avatar: "/avatars/bwilson.png",
bio: "Data engineer",
joined: ISODate("2025-08-01"),
theme: "dark",
emailNotifications: true
})
To read the user with their profile, use $lookup:
db.users.aggregate([
{ $match: { username: "bwilson" } },
{
$lookup: {
from: "profiles",
localField: "profileId",
foreignField: "_id",
as: "profile"
}
},
{ $addFields: { profile: { $arrayElemAt: ["$profile", 0] } } }
])
Reference when the related data is large (e.g., a profile with an embedded avatar image), accessed independently, or updated on a different schedule than the parent. For a simple one-to-one, embedding is usually the better default.
One-to-Many Relationships
One-to-many is the most common relationship. A blog author has many posts, a customer has many orders, a department has many employees.
Pattern 1: Embedded Array
Embed the related items as an array of sub-documents:
db.authors.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e5"),
name: "Jane Doe",
email: "[email protected]",
posts: [
{ title: "Intro to MongoDB", slug: "intro-mongodb", published: ISODate("2026-01-10") },
{ title: "Aggregation Pipeline Deep Dive", slug: "agg-pipeline", published: ISODate("2026-02-14") },
{ title: "Indexing Strategies", slug: "indexing-strategies", published: ISODate("2026-03-01") }
]
})
Querying is straightforward. Find an author and their most recent post:
db.authors.findOne(
{ name: "Jane Doe" },
{ posts: { $slice: -1 } }
)
Add a new post to an existing author:
db.authors.updateOne(
{ name: "Jane Doe" },
{ $push: { posts: { title: "Schema Design", slug: "schema-design", published: ISODate("2026-04-01") } } }
)
Embedding works well when:
- The related documents are bounded (at most a few dozen).
- The related documents are always queried with the parent.
- Updates to related data happen through the parent.
- The total document size stays well under 16 MB.
Pattern 2: Child Reference (Parent Points to Children)
Instead of embedding every child, the parent stores an array of child IDs:
db.authors.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e6"),
name: "Mark Lee",
email: "[email protected]",
postIds: [
ObjectId("65a1b2c3d4e5f6a7b8c9d0e7"),
ObjectId("65a1b2c3d4e5f6a7b8c9d0e8"),
ObjectId("65a1b2c3d4e5f6a7b8c9d0e9")
]
})
db.posts.insertMany([
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e7"), title: "MongoDB Basics", slug: "mongodb-basics", authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0e6") },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e8"), title: "Replica Sets", slug: "replica-sets", authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0e6") },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0e9"), title: "Sharding", slug: "sharding", authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0e6") }
])
Retrieve an author with their posts:
db.authors.aggregate([
{ $match: { name: "Mark Lee" } },
{
$lookup: {
from: "posts",
localField: "postIds",
foreignField: "_id",
as: "posts"
}
}
])
This pattern avoids unbounded array growth in the parent document. The parent stays small even when the author writes thousands of posts. The trade-off is an extra lookup at read time.
Pattern 3: Parent Reference (Child Points to Parent)
The most common referencing pattern for one-to-many stores the parent ID in each child document:
db.authors.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0ea"),
name: "Sarah Chen",
email: "[email protected]"
})
db.posts.insertMany([
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0eb"), title: "Data Modeling", slug: "data-modeling", authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ea") },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0ec"), title: "Performance Tuning", slug: "performance-tuning", authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ea") }
])
Find all posts by an author:
db.posts.find({ authorId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ea") })
This is the most scalable one-to-many pattern. It puts no size pressure on either collection, and you can query children independently without touching the parent. Use this when the number of children is unbounded or when you frequently query children in isolation.
Why Not Always Embed: Document Size Considerations
Consider an e-commerce order that contains line items. You might be tempted to embed all items:
db.orders.insertOne({
orderId: "ORD-2026-001",
customerId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ed"),
items: [
{ sku: "MB-001", name: "Mouse", qty: 2, price: 29.99 },
{ sku: "KB-002", name: "Keyboard", qty: 1, price: 89.99 },
{ sku: "MN-003", name: "Monitor", qty: 1, price: 299.99 }
],
total: 449.96,
createdAt: ISODate("2026-04-01")
})
This works for typical orders of 1-50 items. But a B2B order with 10,000 line items would push the document past reasonable limits. Worse, every read of the order retrieves all 10,000 items even when you only need the total. In this case, promote line items to their own collection and reference the order.
db.orders.insertOne({
orderId: "ORD-2026-002",
customerId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ee"),
total: 15499.50,
itemCount: 10000,
createdAt: ISODate("2026-04-01")
})
db.lineItems.insertMany([
{ orderId: "ORD-2026-002", sku: "WH-001", qty: 500, unitPrice: 3.50 },
// ... 9999 more items
])
Many-to-Many Relationships
In a many-to-many relationship, documents on both sides can relate to multiple documents on the other side. A book can have multiple authors, and an author can write multiple books. An invoice can contain many products, and a product appears in many invoices.
Pattern 1: Two-Way Referencing
Store an array of IDs on both sides of the relationship:
db.books.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0ef"),
title: "MongoDB: The Definitive Guide",
isbn: "978-1491954461",
authorIds: [
ObjectId("65a1b2c3d4e5f6a7b8c9d0f0"),
ObjectId("65a1b2c3d4e5f6a7b8c9d0f1")
]
})
db.authors.insertMany([
{
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f0"),
name: "Kristina Chodorow",
bookIds: [ ObjectId("65a1b2c3d4e5f6a7b8c9d0ef") ]
},
{
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f1"),
name: "Shannon Bradshaw",
bookIds: [ ObjectId("65a1b2c3d4e5f6a7b8c9d0ef") ]
}
])
Query books by a specific author:
db.books.find({ authorIds: ObjectId("65a1b2c3d4e5f6a7b8c9d0f0") })
Query authors of a specific book:
db.authors.find({ bookIds: ObjectId("65a1b2c3d4e5f6a7b8c9d0ef") })
Two-way referencing gives you fast lookups in both directions without $lookup. The trade-off is that every write (adding or removing a relationship) must update both collections.
Pattern 2: One-Way Referencing
Store the relationship on only one side. For an e-commerce system, products rarely need to know which orders contain them, but orders must know which products they include:
db.orders.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f2"),
customerId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f3"),
products: [
{ productId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f4"), sku: "MB-001", name: "Mouse", qty: 1, price: 29.99 },
{ productId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f5"), sku: "KB-002", name: "Keyboard", qty: 1, price: 89.99 }
],
total: 119.98
})
db.products.insertMany([
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f4"), sku: "MB-001", name: "Mouse", price: 29.99, stock: 150 },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f5"), sku: "KB-002", name: "Keyboard", price: 89.99, stock: 75 }
])
One-way referencing reduces write overhead. You only update the owning document. Use this when the relationship is inherently directional or when one side rarely needs to traverse the relationship.
Pattern 3: Embedding
For many-to-many relationships where both sides have a small, bounded set of related items, embed the data directly:
db.teachers.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f6"),
name: "Dr. Patel",
students: [
{ studentId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f7"), name: "Alice", grade: "A" },
{ studentId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f8"), name: "Bob", grade: "B+" }
]
})
db.students.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f7"),
name: "Alice",
teachers: [
{ teacherId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f6"), name: "Dr. Patel", subject: "Computer Science" }
]
})
Embedding for many-to-many works only when both arrays stay small (under a few dozen entries) and the data is read together. For a grade school classroom with 30 students and 6 teachers per student, this is acceptable. For an online course platform with millions of students, it is not.
One-to-Zillions Relationships
Gordon, a MongoDB instructor, coined “one-to-zillions” to describe relationships where one entity relates to an unbounded, massive number of related entities. Think server logs, IoT sensor readings, or clickstream events.
A server might generate billions of log entries. Embedding is impossible. Even storing an array of references in the parent would exceed the 16 MB limit. The only viable strategy is a parent reference — store the parent ID on each child document:
db.servers.insertOne({
_id: ObjectId("65a1b2c3d4e5f6a7b8c9d0f9"),
hostname: "web-01.prod.example.com",
ip: "10.0.1.10",
region: "us-east-1"
})
db.logs.insertMany([
{ serverId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f9"), level: "INFO", message: "Service started", timestamp: ISODate("2026-04-01T00:00:00Z") },
{ serverId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f9"), level: "WARN", message: "High memory usage", timestamp: ISODate("2026-04-01T00:01:00Z") },
// billions more...
])
Query recent logs for a server:
db.logs.find(
{ serverId: ObjectId("65a1b2c3d4e5f6a7b8c9d0f9") }
).sort({ timestamp: -1 }).limit(100)
The key insight: you must index on serverId and timestamp (compound index) for this query to perform at scale. Without indexes, scanning billions of log documents is catastrophic.
db.logs.createIndex({ serverId: 1, timestamp: -1 })
Decision Table: Embed vs Reference
| Criteria | Embed | Reference |
|---|---|---|
| Related data size | Small, bounded | Large or unbounded |
| Access pattern | Always queried with parent | Queried independently |
| Data growth | Predictable, limited | Unpredictable, potentially massive |
| Write frequency | Updated with parent | Updated independently |
| Data duplication tolerance | Acceptable | Minimized (normalized) |
| Atomicity requirement | Single document (atomic) | Multi-document (not atomic) |
| 16 MB document limit | Must stay under | Not a concern |
| Read performance | One round trip | Multiple queries or $lookup |
Use embedding when you check more boxes on the left. Use referencing when you check more boxes on the right.
Schema Validation for Relationships
MongoDB schema validation enforces relationship structure at the database level. Here is a validator for an orders collection that requires an array of product references:
db.createCollection("orders", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "customerId", "items", "total" ],
properties: {
customerId: {
bsonType: "objectId",
description: "must be an ObjectId referencing the customers collection"
},
items: {
bsonType: "array",
minItems: 1,
items: {
bsonType: "object",
required: [ "productId", "qty", "price" ],
properties: {
productId: { bsonType: "objectId" },
qty: { bsonType: "int", minimum: 1 },
price: { bsonType: "double", minimum: 0 }
}
}
},
total: { bsonType: "double", minimum: 0 }
}
}
}
})
Advanced Join: $graphLookup
For hierarchical many-to-many relationships (like org charts or recommendation graphs), $graphLookup traverses recursive references:
db.employees.insertMany([
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0fa"), name: "CEO", reportsTo: null },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0fb"), name: "VP Eng", reportsTo: ObjectId("65a1b2c3d4e5f6a7b8c9d0fa") },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0fc"), name: "Eng Mgr", reportsTo: ObjectId("65a1b2c3d4e5f6a7b8c9d0fb") },
{ _id: ObjectId("65a1b2c3d4e5f6a7b8c9d0fd"), name: "Engineer", reportsTo: ObjectId("65a1b2c3d4e5f6a7b8c9d0fc") }
])
db.employees.aggregate([
{ $match: { name: "CEO" } },
{
$graphLookup: {
from: "employees",
startWith: "$_id",
connectFromField: "_id",
connectToField: "reportsTo",
as: "allReports",
maxDepth: 10
}
}
])
This returns the CEO document with an allReports array containing all four levels of the org hierarchy.
Practice Exercises
Exercise 1: One-to-Many Design
You are modeling a forum. Each user can write many posts. A post belongs to exactly one user. Posts are queried by author, and you also show recent posts on the user’s profile page. Users never have more than 500 posts. What pattern do you choose and why?
Solution: Embed an array of post references (IDs) in the user document and also store the userId on each post. The user document stays small (an ObjectId is 12 bytes; 500 IDs is ~6 KB). You can render the profile page with one query (db.users.findOne({ _id: userId }, { postIds: 1 })) and fetch full posts with $lookup or paginated queries on the posts collection. The userId on each post enables direct queries like “show all posts by this user.”
Exercise 2: Many-to-Many Design
You are modeling a course platform. A student can enroll in many courses. A course can have thousands of students. The enrollment record stores the enrollment date, grade, and progress. Which pattern do you choose?
Solution: Use a junction collection (enrollments) that references both student and course. Do NOT embed enrollments into either document — courses with thousands of embedded student records would exceed document limits, and querying “all courses for Alice” would require scanning the entire course collection.
db.enrollments.createIndex({ studentId: 1, courseId: 1 }, { unique: true })
db.enrollments.insertOne({
studentId: ObjectId("65a1b2c3d4e5f6a7b8c9d0fe"),
courseId: ObjectId("65a1b2c3d4e5f6a7b8c9d0ff"),
enrolledAt: ISODate("2026-04-01"),
grade: null,
progress: 0.35
})
Exercise 3: One-to-Zillions
A weather station records temperature and humidity every second. It generates 86,400 readings per day (31.5M per year). You need to query: “show readings for station X on date Y sorted by timestamp.” What do you do?
Solution: Use a parent reference pattern. Store the stationId on each reading document. Create a compound index on { stationId: 1, timestamp: 1 }. For daily queries, consider time-based bucketing (one document per station per hour containing an array of 3600 readings) to reduce document count, but verify the hourly bucket stays under 16 MB. Each reading ~50 bytes means 3600 readings ~180 KB — well within limits.
Resources
- MongoDB University: M320 Data Modeling — Official course covering these patterns in depth.
- MongoDB Documentation: Data Modeling Introduction — Reference for schema design patterns.
- MongoDB Documentation: Schema Validation — JSON Schema for enforcing relationship structure.
- MongoDB Blog: Building with Patterns — Series of articles covering 12+ schema design patterns including the ones described here.
- M320 Chapter 2 Lab Solutions — GitHub repository with lab solutions from the official M320 course.
Comments