Skip to main content

M121: Chapter 5: Miscellaneous Aggregation

The MongoDB Aggregation Framework (Chapter 5 of 6)

Published: June 12, 2021 Updated: May 24, 2026 Larry Qu 9 min read

The $redact Stage

$redact restricts the contents of documents based on information stored in the documents themselves. It is useful for implementing field-level security and conditional data access.

// Creating a variable to refer against
var userAccess = "Management";

// Comparing whether the value in the userAccess variable is in the array
// referenced by the $acl field path
db.employees.aggregate([
  {
    $redact: {
      $cond: [{ $in: [userAccess, "$acl"] }, "$$DESCEND", "$$PRUNE"],
    },
  },
]).pretty();

How $redact Works

$redact evaluates each document and its embedded documents recursively using one of three actions:

  • $$DESCEND — Keep the current level and continue evaluating embedded documents
  • $$PRUNE — Remove the current field or document entirely
  • $$KEEP — Keep the current field or document without further evaluation

The $out Stage

$out takes the documents returned by the aggregation pipeline and writes them to a specified collection. If the output collection already exists, $out replaces it atomically.

// Pivot data in the books collection to group titles by author
// then write the results to the authors collection
db.getSiblingDB("test").books.aggregate([
  { $group: { _id: "$author", books: { $push: "$title" } } },
  { $out: "authors" },
]);

Important $out Behaviors

  • Creates a new collection if it does not exist
  • Replaces the entire collection if it already exists
  • Drops indexes on the target collection — indexes must be recreated
  • Must be the last stage in the pipeline
  • Does not output results to the client
// Using $out with a different database
db.orders.aggregate([
  {
    $match: { status: "delivered", createdAt: { $gte: startOfMonth } },
  },
  {
    $group: {
      _id: "$productId",
      totalSales: { $sum: "$amount" },
      orderCount: { $sum: 1 },
    },
  },
  {
    $out: { db: "analytics", coll: "monthly_product_sales" },
  },
]);

$merge Overview

$merge writes the results of the aggregation pipeline to a specified collection. Unlike $out, $merge can merge results into an existing collection, update matching documents, and handle both matched and unmatched documents with fine-grained control.

The $merge operator must be the last stage in the pipeline.

$merge Syntax

{
  $merge: {
    into: <collection> -or- { db: <db>, coll: <collection> },
    on: <identifier field> -or- [<identifier field1>, ...], // Optional
    let: <variables>,                                        // Optional
    whenMatched: <replace|keepExisting|merge|fail|pipeline>, // Optional
    whenNotMatched: <insert|discard|fail>                    // Optional
  }
}

Practical $merge Example

db.getSiblingDB("zoo").salaries.aggregate([
  {
    $group: {
      _id: { fiscal_year: "$fiscal_year", dept: "$dept" },
      salaries: { $sum: "$salary" },
    },
  },
  {
    $merge: {
      into: { db: "reporting", coll: "budgets" },
      on: "_id",
      whenMatched: "replace",
      whenNotMatched: "insert",
    },
  },
]);

$merge Action Options

Option Description
replace Replace matching document with the aggregation result
keepExisting Keep the existing document; ignore the aggregation result
merge Merge fields from the aggregation result into the existing document
fail Throw an error if a matching document is found
pipeline Apply a custom update pipeline to matching documents
// Problem: Consider a $merge stage that:
// - outputs results to analytics collection
// - merges on the name field
// - updates existing docs with modified information
// - creates a new doc if one does not exist

{
  $merge: {
    into: "analytics",
    on: "name",
    whenMatched: "merge",
    whenNotMatched: "insert",
  },
}

Consider an aggregation pipeline using the new $merge stage that outputs to the employee_data collection. If we are not expecting to find any matching documents, whenMatched: "fail" raises an error when a match occurs, making the anomaly visible immediately.

Using $merge for Single View

A common pattern is using $merge to create a consolidated view from multiple source collections. This is useful for reporting, search systems, and data warehouses.

// Merge movie data with comments into a single view collection
db.movies.aggregate([
  {
    $lookup: {
      from: "comments",
      localField: "_id",
      foreignField: "movie_id",
      as: "comments",
    },
  },
  {
    $addFields: {
      commentCount: { $size: "$comments" },
      recentComments: { $slice: ["$comments", 5] },
    },
  },
  { $project: { comments: 0 } },
  { $out: "movies_with_comments" },
]);

Using $merge for Rollups

Rollups aggregate data over time windows and store the results for fast querying.

// Daily sales rollup
db.orders.aggregate([
  {
    $match: {
      createdAt: {
        $gte: new Date("2026-05-01"),
        $lt: new Date("2026-05-02"),
      },
    },
  },
  {
    $group: {
      _id: { productId: "$productId", date: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } } },
      totalRevenue: { $sum: "$total" },
      orderCount: { $sum: 1 },
      uniqueCustomers: { $addToSet: "$customerId" },
    },
  },
  {
    $merge: {
      into: "daily_sales_rollup",
      on: ["_id.productId", "_id.date"],
      whenMatched: "replace",
      whenNotMatched: "insert",
    },
  },
]);

Views

Views in MongoDB are virtual collections defined by an aggregation pipeline. They do not store data — they compute results on demand from source collections.

  • Horizontal slicing ($match): Reduces the number of documents returned
  • Vertical slicing ($project): Reduces the fields returned (changes document shape)

Creating a View

// Creating a view for bronze-tier banking customers
// The same pattern was used for silver and gold views
db.createView("bronze_banking", "customers", [
  {
    $match: { accountType: "bronze" },
  },
  {
    $project: {
      _id: 0,
      name: {
        $concat: [
          { $cond: [{ $eq: ["$gender", "female"] }, "Miss", "Mr."] },
          " ",
          "$name.first",
          " ",
          "$name.last",
        ],
      },
      phone: 1,
      email: 1,
      address: 1,
      account_ending: { $substr: ["$accountNumber", 7, -1] },
    },
  },
]);

// Get information about all collections and views
db.getCollectionInfos();

// Get details about views specifically
db.system.views.find();

Querying Views

Views support the same query operations as collections:

db.bronze_banking.find();
db.bronze_banking.findOne();
db.bronze_banking.countDocuments();
db.bronze_banking.distinct("account_ending");
db.bronze_banking.aggregate([{ $match: { phone: { $exists: true } } }]);

View Restrictions

  • View definitions are public — anyone who can list collections can see the pipeline
  • Avoid referring to sensitive fields within the view pipeline
  • Views are read-only — write operations on views throw an error
  • Views must abide by aggregation framework rules
  • Views cannot contain find() projection operators — use $project instead
  • Views are created on demand and reflect current data in the source collection

$facet Stage

The $facet stage enables multi-faceted aggregation — processing the same input documents through multiple pipelines in a single stage.

// Product catalog faceted search
db.products.aggregate([
  { $match: { category: "Electronics" } },
  {
    $facet: {
      topSellers: [
        { $sort: { salesCount: -1 } },
        { $limit: 5 },
        { $project: { name: 1, price: 1, salesCount: 1 } },
      ],
      priceBuckets: [
        {
          $bucket: {
            groupBy: "$price",
            boundaries: [0, 50, 100, 200, 500, 1000],
            default: "1000+",
            output: { count: { $sum: 1 }, products: { $push: "$name" } },
          },
        },
      ],
      brandCounts: [
        { $group: { _id: "$brand", count: { $sum: 1 } } },
        { $sort: { count: -1 } },
      ],
      stats: [
        {
          $group: {
            _id: null,
            avgPrice: { $avg: "$price" },
            minPrice: { $min: "$price" },
            maxPrice: { $max: "$price" },
            totalProducts: { $sum: 1 },
          },
        },
      ],
    },
  },
]);

The $facet stage is especially powerful for building search result pages that need to show products alongside faceted navigation filters.

$bucket and $bucketAuto

The $bucket stage categorizes documents into groups (buckets) based on a specified expression and boundary values.

$bucket for Histograms

// Manual bucket boundaries for age distribution
db.users.aggregate([
  {
    $bucket: {
      groupBy: "$age",
      boundaries: [0, 18, 25, 35, 45, 55, 65, 100],
      default: "Unknown",
      output: {
        count: { $sum: 1 },
        averageIncome: { $avg: "$income" },
        users: { $push: { name: { $concat: ["$firstName", " ", "$lastName"] } } },
      },
    },
  },
]);

$bucketAuto for Automatic Bucketing

$bucketAuto automatically determines bucket boundaries to distribute documents evenly.

// Automatically create 5 buckets of roughly equal size
db.orders.aggregate([
  {
    $bucketAuto: {
      groupBy: "$total",
      buckets: 5,
      output: {
        count: { $sum: 1 },
        avgTotal: { $avg: "$total" },
        minTotal: { $min: "$total" },
        maxTotal: { $max: "$total" },
      },
    },
  },
]);

Customize $bucketAuto with the granularity parameter using series such as R5, R10, R20, E6, E12, or POWERSOF2 for preferred number boundaries.

$graphLookup Stage

The $graphLookup stage performs recursive graph traversal to find relationships between documents. This is ideal for hierarchical data, social networks, and recommendation engines.

Employee Hierarchy

// Employees collection with manager references
{
  _id: 1,
  name: "Alice",
  role: "CEO",
  reportsTo: null
},
{
  _id: 2,
  name: "Bob",
  role: "CTO",
  reportsTo: 1
},
{
  _id: 3,
  name: "Charlie",
  role: "Engineer",
  reportsTo: 2
},
{
  _id: 4,
  name: "Diana",
  role: "Engineer",
  reportsTo: 2
}

// Find all direct and indirect reports of Alice (CEO)
db.employees.aggregate([
  { $match: { name: "Alice" } },
  {
    $graphLookup: {
      from: "employees",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "reportsTo",
      as: "allReports",
      maxDepth: 10,
      depthField: "level",
    },
  },
  {
    $project: {
      name: 1,
      role: 1,
      allReports: {
        $map: {
          input: "$allReports",
          as: "report",
          in: {
            name: "$$report.name",
            role: "$$report.role",
            level: "$$report.level",
          },
        },
      },
    },
  },
]);

$geoNear Stage

The $geoNear stage returns documents sorted by distance from a specified point. It requires a geospatial index.

Setting Up Geospatial Data

// Create a 2dsphere index
db.locations.createIndex({ coordinates: "2dsphere" });

// Sample documents
db.locations.insertMany([
  {
    name: "Central Park",
    type: "Park",
    coordinates: { type: "Point", coordinates: [-73.9654, 40.7829] },
    rating: 4.8,
  },
  {
    name: "Empire State Building",
    type: "Landmark",
    coordinates: { type: "Point", coordinates: [-73.9857, 40.7484] },
    rating: 4.5,
  },
  {
    name: "Brooklyn Bridge",
    type: "Landmark",
    coordinates: { type: "Point", coordinates: [-73.9969, 40.7061] },
    rating: 4.6,
  },
]);

Basic GeoNear Query

// Find the 10 closest locations to Times Square
db.locations.aggregate([
  {
    $geoNear: {
      near: { type: "Point", coordinates: [-73.9855, 40.7580] },
      distanceField: "distance",
      maxDistance: 5000,
      minDistance: 0,
      spherical: true,
      num: 10,
    },
  },
  {
    $project: {
      name: 1,
      type: 1,
      rating: 1,
      distance: { $round: ["$distance", 0] },
    },
  },
]);

Summary

  • $redact — Implements field-level security by conditionally including or excluding document content
  • $out — Writes pipeline results to a new collection, replacing it entirely
  • $merge — Writes pipeline results with fine-grained control over matching behavior
  • Views — Virtual read-only collections defined by aggregation pipelines for access control and data shaping
  • $facet — Processes the same input through multiple pipelines in a single stage, ideal for faceted search and dashboards
  • $bucket / $bucketAuto — Groups documents into histograms with manual or automatic boundaries
  • $graphLookup — Recursive graph traversal for hierarchies, social networks, and recommendation systems
  • $geoNear — Geospatial queries sorted by distance, requiring a 2dsphere index

Views contain no data themselves — they are created on demand and reflect the data in the source collection. Views are read-only; write operations to views will error. Views have some restrictions: they must abide by the rules of the aggregation framework and cannot contain find() projection operators. Horizontal slicing is performed with the $match stage, reducing the number of documents returned. Vertical slicing is performed with a $project or other shaping stage, modifying individual documents.

Comments

👍 Was this article helpful?