MongoDB Internals: Storage Engine, WiredTiger, and Data Structures

Introduction

Understanding MongoDB’s internal architecture helps developers and administrators make better decisions about schema design, performance tuning, and troubleshooting. MongoDB’s architecture has evolved significantly, with the WiredTiger storage engine being the default since MongoDB 3.2.

This article explores the internals of MongoDB, covering the storage engine architecture, data structures, memory management, and query execution. This knowledge enables you to understand how MongoDB handles data and why certain operations behave as they do.

Storage Engine Architecture

MongoDB’s storage engine is responsible for managing how data is stored, accessed, and maintained on disk and in memory.

WiredTiger Overview

WiredTiger is MongoDB’s default storage engine, providing:

Document-level concurrency control
Compression (Snappy, Zstandard)
Checkpointing and recovery
Multi-version Concurrency Control (MVCC)
B-Tree and LSM (Log-Structured Merge) indexes

WiredTiger Architecture

┌─────────────────────────────────────────────────────────────┐
│                    MongoDB Process                         │
├─────────────────────────────────────────────────────────────┤
│  Query Layer                                               │
│  - Query Parser                                            │
│  - Query Optimizer                                         │
│  - Execution Engine                                        │
├─────────────────────────────────────────────────────────────┤
│  WiredTiger API Layer                                      │
│  - WT_SESSION                                              │
│  - WT_CURSOR                                               │
│  - WT_TRANSACTION                                          │
├─────────────────────────────────────────────────────────────┤
│  WiredTiger Engine                                         │
│  ┌──────────────┬──────────────┬────────────────────────┐  │
│  │  Cache       │  Checkpoint  │  Write Generation     │  │
│  │  Manager     │  Manager     │  Manager              │  │
│  └──────────────┴──────────────┴────────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│  Storage (Disk)                                            │
│  - Data Files (.wt)                                        │
│  - Journal Files (.journal)                                │
│  - Index Files (.wt)                                       │
│  - Diagnostic Files                                        │
└─────────────────────────────────────────────────────────────┘

Memory Architecture

WiredTiger uses a memory cache for frequently accessed data.

// Check memory usage
db.serverStatus().wiredTiger.cache

// Example output:
// {
//   "maximum bytes configured": 8589934592,  // 8GB
//   "bytes currently in the cache": 2147483648,
//   "percentage of cache used": "25.00%",
//   "tracked dirty bytes in the cache": 1073741824,
//   "percentage of dirty data": "12.50%"
// }

The cache size is typically set to 50% of RAM minus 1GB, or configured manually:

# mongod.conf
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 8

Data Files and Structures

MongoDB stores data in multiple file types, each serving a specific purpose.

File Organization

# Typical data directory contents
/data/db/
├── collection-0-123456789.wt      # Data files
├── collection-1-123456789.wt
├── index-0-123456789.wt         # Index files
├── index-1-123456789.wt
├── _mdb_catalog.wt              # Collection metadata
├── journal/
│   ├── WiredTigerLog.0          # Write-ahead log
│   ├── WiredTigerLog.1
│   └── WiredTigerPreplog.0
├── mongod.lock                   # Lock file
├── storage.bson                  # Storage options
└── WiredTiger                    # WiredTiger metadata

Record Storage

Each document in MongoDB is stored as a record within a collection. WiredTiger uses a record format that includes:

// Conceptual record structure
struct Record {
  RecordID id;           // Unique identifier
  uint32_t length;      // Record size
  uint32_t next;        // Next record (for variable-length)
  uint8_t flags;        // Record flags
  char data[];          // BSON document data
};

Padding and Storage Efficiency

WiredTiger allocates extra space for records to accommodate updates without requiring immediate relocation.

// Check storage statistics
db.products.stats()

// Output includes:
// {
//   "storageSize": 16777216,
//   "totalSize": 20971520,
//   "avgObjSize": 256,
//   "paddingFactor": 1.25,  // 25% padding
//   "numExtents": 3
// }

B-Tree Indexes

MongoDB uses B-Tree indexes as the primary index structure, providing efficient lookups, range queries, and sorting.

B-Tree Structure

            [50]
          /      \
    [20,30]      [70,90]
    /  |  \      /   |   \
 [10] [25] [40] [60] [80] [100]

Index Data

// Create a compound index
db.products.createIndex({ category: 1, price: -1 })

// This creates a B-Tree where:
// - Primary key: category (ascending)
// - Secondary key: price (descending)

// Index structure (conceptual):
// {
//   "Electronics": {
//     999: { _id: 1, data: ... },
//     499: { _id: 2, data: ... }
//   },
//   "Books": {
//     29: { _id: 3, data: ... },
//     15: { _id: 4, data: ... }
//   }
// }

Index Types

// Single-field index
db.users.createIndex({ email: 1 })

// Compound index
db.products.createIndex({ category: 1, price: -1 })

// Multikey index (array fields)
db.products.createIndex({ tags: 1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// Sparse index
db.employees.createIndex({ ssn: 1 }, { sparse: true })

// Text index
db.articles.createIndex({ title: 'text', content: 'text' })

// Geospatial index
db.places.createIndex({ location: '2dsphere' })

// Hashed index (for sharding)
db.orders.createIndex({ order_id: 'hashed' })

Journaling and Durability

MongoDB uses write-ahead logging (journaling) to ensure durability of writes.

Journal Process

// Journal flow:
// 1. Application issues write
// 2. Write recorded in journal (WAL)
// 3. Write applied to cache
// 4. On checkpoint, data flushed to disk
// 5. Journal space recycled

Journal Configuration

# mongod.conf
storage:
  journal:
    enabled: true
    commitIntervalMs: 100  # Commit every 100ms

Write Concern with Journaling

// Wait for journal (most durable)
db.orders.insertOne(
  { item: 'Laptop', quantity: 1 },
  { writeConcern: { j: true, wtimeout: 5000 } }
)

// Wait for majority acknowledgment with journal
db.orders.insertOne(
  { item: 'Mouse', quantity: 5 },
  { writeConcern: { w: 'majority', j: true } }
)

Recovery Process

On startup, MongoDB:

Reads the last checkpoint
Replays journal entries since checkpoint
Recovers to consistent state
Begins normal operations

Concurrency Control

WiredTiger uses optimistic concurrency control with document-level locking.

Lock Granularity

// Lock behavior:
// - Most operations use document-level locks
// - Some operations (index creation, compaction) use collection-level locks
// - Database-level operations use database locks
// - Cluster operations use cluster-level locks

// Check lock statistics
db.serverStatus().locks

// Output:
// {
//   "Collection": {
//     "acquireCount": { "r": 12345, "w": 6789, "W": 10 }
//   }
// }

MVCC (Multi-Version Concurrency Control)

WiredTiger maintains multiple versions of documents to support concurrent operations:

// Concurrent read/write scenario:
// 1. Transaction A starts, reads document version 1
// 2. Transaction B starts, reads document version 1
// 3. Transaction A updates, creates version 2
// 4. Transaction B attempts update, fails due to conflict
// 5. Transaction B retries with new version

Transaction Support

MongoDB 4.0+ supports multi-document ACID transactions.

const session = client.startSession();

try {
  await session.withTransaction(async () => {
    // Both operations succeed or both fail
    await accounts.updateOne(
      { _id: 'A' },
      { $inc: { balance: -100 } },
      { session }
    );
    
    await accounts.updateOne(
      { _id: 'B' },
      { $inc: { balance: 100 } },
      { session }
    );
  });
} finally {
  await session.endSession();
}

Query Execution

Understanding how MongoDB executes queries helps optimize performance.

Query Planning

// Use explain to see query plan
db.products.find({ category: 'Electronics', price: { $lt: 500 } })
  .sort({ price: 1 })
  .explain('allPlansExecution')

// Output includes:
// - queryPlanner (chosen plan)
// - executionStats (timing details)
// - serverInfo (MongoDB version)

Query Stages

// Example execution stages:
// 1. COLLSCAN - Collection scan
// 2. IXSCAN - Index scan
// 3. FETCH - Retrieve documents
// 4. PROJECTION - Apply projection
// 5. SORT - Sort results
// 6. LIMIT - Apply limit
// 7. SKIP - Apply skip

Query Optimization

// Ensure queries use indexes
db.products.find({ category: 'Electronics' }).hint({ category: 1 })

// Cover queries (index-only)
db.products.find(
  { category: 'Electronics' },
  { _id: 0, name: 1, price: 1 }
).explain('executionStats')

// Output:
// {
//   "executionStats": {
//     "totalDocsExamined": 0,  // No documents fetched!
//     "indexName": "category_1_price_-1"
//   }
// }

Compression

WiredTiger supports multiple compression algorithms.

Compression Types

// WiredTiger compression options:
// - Snappy (default, fast compression)
// - Zstandard (better compression ratio, MongoDB 4.2+)
// - zlib (best compression, highest CPU)
// - No compression

// Configure at collection level
db.createCollection('products', {
  storageEngine: {
    wiredTiger: {
      configString: 'block_compressor=snappy'
    }
  }
})

Compression Stats

// Check compression statistics
db.products.stats()

// Output includes:
// {
//   "avgObjSize": 256,
//   "storageSize": 1048576,
//   "compressionRatio": 0.65  // 65% of original size
// }

Checkpointing

Checkpoints create consistent snapshots of the database.

Checkpoint Process

// Checkpoint behavior:
// 1. Occur every 60 seconds (default)
// 2. Create consistent snapshot
// 3. Copy checkpoint to data files
// 4. Truncate journal after successful checkpoint

// Check checkpoint status
db.serverStatus().wiredTiger.checkpoint

// Output:
// {
//   "Application time checkpoint": 1700000000,
//   "Checkpoint generated at": 1700000060,
//   "Maximum pages referenced": 1000,
//   "Maximum pages skipped": 0,
//   "Pages written by checkpoint": 100
// }

Memory Management

WiredTiger manages memory through its cache system.

Cache Behavior

// Cache eviction:
// 1. When cache reaches 80% full
// 2. Evict clean pages first
// 3. Then evict dirty pages (after checkpoint)

// Monitor cache activity
db.serverStatus().wiredTiger.cache

// Key metrics:
// - bytes currently in the cache
// - tracked dirty bytes in the cache
// - eviction calls
// - pages read into cache

Tuning Cache Size

# For systems with large RAM
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 16      # Increase for large datasets
      journalCompressor: zstd

Replication Internals

Understanding replica set mechanics.

Oplog

The oplog (operations log) is a special capped collection that records all operations.

// Check oplog size
db.getSiblingDB('local').oplog.rs.find().sort({ ts: -1 }).limit(1)

// Oplog entry structure:
// {
//   "ts": Timestamp(1700000000, 1),
//   "t": 123,
//   "h": 1234567890123456789,
//   "v": 2,
//   "op": "i",           // i=insert, u=update, d=delete
//   "ns": "myapp.orders",
//   "o": { ... }        // Document operation
// }

Replication Process

// Replication flow:
// 1. Primary receives write operation
// 2. Operation written to primary's oplog
// 3. Secondaries poll primary for new oplog entries
// 4. Secondaries apply operations locally
// 5. Secondary acknowledges to primary

Heartbeat and Election

// Heartbeat interval: 2 seconds (default)
// Election timeout: 10 seconds
// Priority affects election likelihood

// Configure member priority
cfg = rs.conf()
cfg.members[0].priority = 2  // More likely to become primary
cfg.members[1].priority = 1
cfg.members[2].priority = 0  // Cannot become primary
rs.reconfig(cfg)

External Resources

Conclusion

Understanding MongoDB’s internals helps you make informed decisions about schema design, indexing, and performance optimization. The WiredTiger storage engine provides document-level concurrency, compression, and robust durability through journaling.

Key takeaways:

B-Tree indexes provide efficient lookups and range queries
Journaling ensures durability through write-ahead logging
MVCC enables concurrent operations without blocking
Checkpoints create consistent recovery points
Memory caching significantly impacts performance

This knowledge forms the foundation for advanced MongoDB administration and optimization.

In the next article, we will explore MongoDB’s developments in 2025-2026, including new features and cloud capabilities.