Skip to main content

M320: Chapter 4: Patterns Part 2

MongoDB Data Modeling: Patterns Part 2

Published: June 12, 2021 Updated: May 24, 2026 Larry Qu 9 min read

Computed Pattern

The computed pattern is used when you need to perform the same computations many times. Instead of recalculating values on every read, pre-compute and store them in the document.

When to Use the Computed Pattern

Use this pattern when:

  • The same calculation runs repeatedly on the same data
  • The source data changes infrequently while reads are frequent
  • The computation is expensive (multiple documents, complex formulas)
  • Read latency is critical and must be predictable

Common use cases include:

  • Math operations — Running totals, averages, percentages
  • Fan-out operations — Distributing computed results to related documents
  • Roll-up operations — Aggregating data across time periods or categories

Pattern Mechanics

Instead of computing on every read:

// Bad: compute on every read
const pipeline = [
  { $match: { productId: "prod123" } },
  { $group: { _id: null, total: { $sum: "$amount" }, count: { $sum: 1 } } },
];
const result = await db.collection("sales").aggregate(pipeline).next();

Store the pre-computed result in the document:

// Good: store the computed value
{
  _id: ObjectId("..."),
  productId: "prod123",
  productName: "Wireless Headphones",
  dailySales: {
    date: ISODate("2026-03-15"),
    totalAmount: 3199.60,
    orderCount: 42,
    averageOrderValue: 76.18
  }
}

Pre-computation Strategies

Strategy 1: Update on Write

Compute the value when the source data changes.

async function recordSale(sale) {
  const session = db.getMongo().startSession();
  session.startTransaction();

  try {
    // Insert the sale record
    await db.collection("sales").insertOne(sale, { session });

    // Update the pre-computed daily summary
    await db.collection("daily_summaries").updateOne(
      {
        productId: sale.productId,
        "dailySales.date": {
          $eq: new Date(sale.createdAt.toISOString().split("T")[0]),
        },
      },
      {
        $inc: {
          "dailySales.totalAmount": sale.amount,
          "dailySales.orderCount": 1,
        },
      },
      { upsert: true, session },
    );

    await session.commitTransaction();
  } catch (err) {
    await session.abortTransaction();
    throw err;
  } finally {
    session.endSession();
  }
}

Strategy 2: Scheduled Batch Computation

Periodically recompute values using the aggregation pipeline and $merge.

// Hourly rollup — batch job run by a scheduler (e.g., cron, node-cron)
const pipeline = [
  {
    $match: {
      createdAt: {
        $gte: new Date(Date.now() - 3600000),
      },
    },
  },
  {
    $group: {
      _id: {
        productId: "$productId",
        hour: { $dateToString: { format: "%Y-%m-%dT%H:00:00Z", date: "$createdAt" } },
      },
      totalAmount: { $sum: "$amount" },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: "$amount" },
    },
  },
  {
    $merge: {
      into: "hourly_summaries",
      on: ["_id.productId", "_id.hour"],
      whenMatched: "replace",
      whenNotMatched: "insert",
    },
  },
];

await db.collection("sales").aggregate(pipeline).toArray();

Aggregation Pipeline for Computation

The aggregation framework is the primary tool for computing values in MongoDB.

// Compute product category statistics
const categoryStats = await db.collection("products").aggregate([
  {
    $group: {
      _id: "$category",
      productCount: { $sum: 1 },
      averagePrice: { $avg: "$price" },
      minPrice: { $min: "$price" },
      maxPrice: { $max: "$price" },
      totalStock: { $sum: "$inventory.stock" },
    },
  },
  {
    $addFields: {
      inventoryValue: { $multiply: ["$averagePrice", "$totalStock"] },
    },
  },
  {
    $sort: { productCount: -1 },
  },
  {
    $merge: {
      into: "category_stats",
      whenMatched: "replace",
      whenNotMatched: "insert",
    },
  },
]).toArray();

Real-Time vs. Batch Computation

Aspect Real-Time Batch
Latency Immediate Delayed (minutes to hours)
Accuracy Perfect accuracy Eventual consistency
System load Higher per-operation cost Lower overall cost
Implementation Change streams, triggers Scheduled jobs (cron)
Use case Dashboards, alerts Reports, analytics

When to use real-time:

  • Live dashboards that must reflect the latest data
  • Alerting systems that trigger on thresholds
  • User-facing features like current balance or view count

When to use batch:

  • Historical analytics and trend reports
  • End-of-day reconciliation
  • Large-scale data processing where real-time is too expensive

Lab: Apply the Computed Pattern

// Result document after applying the computed pattern
{
  "_id": ObjectId("5c9414f25e6aff2b8870a2d0"),
  "zone": 13,
  "date": ISODate("2019-03-21T00:00:00.000Z"),
  "kW per day": {
    "consumption": 9756,
    "self-produced": 2059,
    "city-supplemented": 7700
  }
}

This document stores pre-computed daily energy metrics. Instead of summing thousands of hourly readings every time a dashboard loads, the application reads one document per zone per day. The computation runs once when the day ends or incrementally as data arrives.

Which one of the following scenarios is best suited for the computed pattern?

We need to calculate a value that is displayed 100 times a minute and is based on a field that updates once a minute. The computation is expensive (aggregating data across 10,000 sensors) and the result is read far more often than the source data changes. The computed pattern caches the result and only recomputes when the source field actually updates.

Bucket Pattern (Used in IoT)

The bucket pattern groups related data into time-based or count-based buckets. This is especially useful for IoT applications, time-series data, and event logging.

// Sensor data stored with bucket pattern — one document per hour
{
  _id: ObjectId("..."),
  sensorId: "sensor-temp-a1",
  startDate: ISODate("2026-03-15T10:00:00Z"),
  endDate: ISODate("2026-03-15T10:59:59Z"),
  readings: [
    { time: ISODate("2026-03-15T10:00:00Z"), value: 22.1 },
    { time: ISODate("2026-03-15T10:05:00Z"), value: 22.3 },
    { time: ISODate("2026-03-15T10:10:00Z"), value: 22.0 },
    // ... 12 readings per hour
  ],
  metadata: {
    unit: "celsius",
    location: "Server Room A",
  },
  stats: {
    min: 22.0,
    max: 22.7,
    avg: 22.3,
    count: 12,
  },
}

Benefits of the Bucket Pattern

  • Reduces the total number of documents by an order of magnitude
  • Single read retrieves many data points
  • Enables efficient time-range queries
  • Keeps document size under control by limiting bucket size
// Creating a bucketed collection with pre-allocated buckets
function getBucketKey(sensorId, timestamp) {
  const startOfHour = new Date(timestamp);
  startOfHour.setMinutes(0, 0, 0);
  return { sensorId, startDate: startOfHour };
}

async function insertSensorReading(sensorId, timestamp, value) {
  const bucketKey = getBucketKey(sensorId, timestamp);

  const result = await db.collection("sensor_data").updateOne(
    {
      ...bucketKey,
      "readings": { $not: { $size: 60 } }, // prevent overflow
    },
    {
      $push: { readings: { time: timestamp, value } },
      $setOnInsert: {
        sensorId,
        startDate: bucketKey.startDate,
        endDate: new Date(bucketKey.startDate.getTime() + 3599999),
      },
      $inc: { "stats.count": 1 },
      $min: { "stats.min": value },
      $max: { "stats.max": value },
    },
    { upsert: true },
  );

  return result;
}

Which one of the following requirements in our system is the best candidate to use the bucket pattern?

Collecting temperature readings from 10,000 sensors every 5 seconds and displaying hourly trend charts. The bucket pattern groups these readings into hour-long documents, reducing document count from 7.2 million per hour to 10,000.

Schema Versioning Pattern

The schema versioning pattern avoids downtime during schema upgrades by allowing multiple document versions to coexist.

// Version 1 document
{
  _id: ObjectId("..."),
  schema_version: 1,
  name: "John Doe",
  email: "[email protected]"
}

// Version 2 document — adds phone field
{
  _id: ObjectId("..."),
  schema_version: 2,
  name: "Jane Smith",
  email: "[email protected]",
  phone: "+1-555-0123"
}

Application-Level Migration

function normalizeUserDocument(doc) {
  if (!doc.schema_version || doc.schema_version === 1) {
    return {
      ...doc,
      displayName: doc.name,
      contact: {
        email: doc.email,
        phone: doc.phone || null,
      },
      schema_version: 2,
    };
  }
  return doc;
}

async function getUser(userId) {
  const doc = await db.collection("users").findOne({ _id: userId });
  return normalizeUserDocument(doc);
}

Lazy Migration

async function migrateDocument(doc) {
  if (doc.schema_version >= 2) return doc;

  const updated = {
    ...doc,
    displayName: doc.name,
    contact: { email: doc.email, phone: null },
    schema_version: 2,
  };

  // Remove old fields
  delete updated.name;

  await db.collection("users").replaceOne({ _id: doc._id }, updated);
  return updated;
}

async function findAndMigrate(query) {
  const cursor = db.collection("users").find(query);

  const results = [];
  for await (const doc of cursor) {
    results.push(await migrateDocument(doc));
  }

  return results;
}

Benefits of Schema Versioning

  • Zero-downtime schema evolution — old documents remain readable
  • Gradual migration — update documents as they are accessed
  • Rollback capability — old applications still work with new documents
  • No lock-in — migrate at your own pace

Tree Patterns

MongoDB supports four tree modeling patterns for hierarchical data such as organization charts, book subjects, and product categories.

  • Parent References — Each node stores a reference to its parent
  • Child References — Each node stores references to its children
  • Array of Ancestors — Each node stores an array of all ancestor IDs
  • Materialized Paths — Each node stores a string path from root to node

Each pattern addresses a different trade-off between read and write efficiency:

// Parent References — each node stores its parent ID
{ _id: ObjectId("..."), name: "Engineering", parentId: ObjectId("...") }

// Child References — each node stores its children IDs
{ _id: ObjectId("..."), name: "Engineering", children: [ObjectId("c1"), ObjectId("c2")] }

// Array of Ancestors — each node stores all ancestor IDs
{ _id: ObjectId("..."), name: "Frontend Team", ancestors: [ObjectId("root"), ObjectId("eng")], parentId: ObjectId("eng") }

// Materialized Paths — each node stores a path string
{ _id: ObjectId("..."), name: "Frontend Team", path: ",engineering,software,frontend," }

Tree Pattern Comparison

Pattern Read Ancestors Read Descendants Write Cost Implementation
Parent References Recursive query Recursive query Low Simple
Child References Complex One query High (large arrays) Medium
Ancestor Array One query One query Medium Medium
Materialized Paths Regex query Regex query Medium Medium

Ancestor array combined with parent reference provides a good balance for most tree use cases.

Polymorphic Pattern

The polymorphic pattern handles documents with varying structures within the same collection. This is useful when different entities share common fields but have unique attributes.

// Collection with polymorphic documents
{
  _id: ObjectId("..."),
  type: "vehicle",
  make: "Tesla",
  model: "Model 3",
  year: 2026,
  // vehicle-specific fields
  doors: 4,
  range_km: 500
},
{
  _id: ObjectId("..."),
  type: "property",
  address: "123 Main St",
  city: "San Francisco",
  price: 1200000,
  // property-specific fields
  bedrooms: 3,
  bathrooms: 2,
  squareFeet: 1500
}

Single View Solution with Polymorphic Pattern

The polymorphic pattern enables a single view across different entity types, ideal for search systems, inventory management, and CRM consolidation.

// Search across all entity types
async function searchAll(searchTerm) {
  return await db.collection("assets").find({
    $or: [
      { make: { $regex: searchTerm, $options: "i" } },
      { model: { $regex: searchTerm, $options: "i" } },
      { address: { $regex: searchTerm, $options: "i" } },
    ],
  }).toArray();
}

Which one of the following scenarios is best suited for the polymorphic pattern?

An organization acquired different companies over the years, serving the same markets with the same customers. There is a requirement to merge all systems into one. Each company has similar but not identical data structures. The polymorphic pattern allows all entities to coexist in a single collection while preserving their unique fields.

Other Patterns

Approximation Pattern

The approximation pattern avoids performing an expensive operation too often by using approximate values instead of exact calculations.

// Track page views with approximation
{
  _id: ObjectId("..."),
  pageId: "article-123",
  viewCount: 4200,
  // Only write to database every 10th view
  _pendingViews: 0
}

// In application code
async function recordPageView(pageId) {
  // Update a counter in memory or cache
  // Flush to database periodically
  await db.collection("page_views").updateOne(
    { pageId },
    { $inc: { viewCount: 1 } },
    { upsert: true },
  );
}

Using a counter that aggregates multiple updates before writing to the database reduces write load dramatically. For a popular page with 10,000 views per minute, writing every 10 views reduces database writes by 90%.

Outlier Pattern

The outlier pattern keeps the focus on the most frequent use cases while handling edge cases separately.

// Normal document — most users have a few favorite products
{
  _id: ObjectId("..."),
  username: "jdoe",
  favorites: ["prod123", "prod456", "prod789"]
}

// Outlier document — a power user with thousands of favorites
{
  _id: ObjectId("..."),
  username: "poweruser",
  favorites: ["prod001", "prod002", /* ... thousands of items */],
  favorites_outlier: true, // flag this as an outlier
  favorites_count: 3500
}

When a query asks about a user’s favorites, the application checks the favorites_outlier flag. Normal users return the embedded array. Power users query a separate collection that handles large arrays efficiently without affecting the common case.

Summary of Patterns

Pattern Problem Solved Best For
Computed Repeated expensive calculations Dashboards, stats, metrics
Bucket Managing large time-series datasets IoT, logging, analytics
Schema Versioning Zero-downtime schema evolution Live production systems
Tree (Ancestor Array) Hierarchical data queries Organization charts, categories
Polymorphic Diverse entity types in one collection Mergers, product catalogs
Approximation Reducing expensive write operations Page views, counters
Outlier Handling edge cases without impacting the norm Social media, user-generated data

Comments

👍 Was this article helpful?