Introduction
Understanding MongoDB’s internal architecture helps developers and administrators make better decisions about schema design, performance tuning, and troubleshooting. MongoDB’s architecture has evolved significantly, with the WiredTiger storage engine being the default since MongoDB 3.2.
This article explores the internals of MongoDB, covering the storage engine architecture, data structures, memory management, and query execution. This knowledge enables you to understand how MongoDB handles data and why certain operations behave as they do.
Storage Engine Architecture
MongoDB’s storage engine is responsible for managing how data is stored, accessed, and maintained on disk and in memory.
WiredTiger Overview
WiredTiger is MongoDB’s default storage engine, providing:
- Document-level concurrency control
- Compression (Snappy, Zstandard)
- Checkpointing and recovery
- Multi-version Concurrency Control (MVCC)
- B-Tree and LSM (Log-Structured Merge) indexes
WiredTiger Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MongoDB Process โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Query Layer โ
โ - Query Parser โ
โ - Query Optimizer โ
โ - Execution Engine โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ WiredTiger API Layer โ
โ - WT_SESSION โ
โ - WT_CURSOR โ
โ - WT_TRANSACTION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ WiredTiger Engine โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Cache โ Checkpoint โ Write Generation โ โ
โ โ Manager โ Manager โ Manager โ โ
โ โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Storage (Disk) โ
โ - Data Files (.wt) โ
โ - Journal Files (.journal) โ
โ - Index Files (.wt) โ
โ - Diagnostic Files โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Memory Architecture
WiredTiger uses a memory cache for frequently accessed data.
// Check memory usage
db.serverStatus().wiredTiger.cache
// Example output:
// {
// "maximum bytes configured": 8589934592, // 8GB
// "bytes currently in the cache": 2147483648,
// "percentage of cache used": "25.00%",
// "tracked dirty bytes in the cache": 1073741824,
// "percentage of dirty data": "12.50%"
// }
The cache size is typically set to 50% of RAM minus 1GB, or configured manually:
# mongod.conf
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 8
Data Files and Structures
MongoDB stores data in multiple file types, each serving a specific purpose.
File Organization
# Typical data directory contents
/data/db/
โโโ collection-0-123456789.wt # Data files
โโโ collection-1-123456789.wt
โโโ index-0-123456789.wt # Index files
โโโ index-1-123456789.wt
โโโ _mdb_catalog.wt # Collection metadata
โโโ journal/
โ โโโ WiredTigerLog.0 # Write-ahead log
โ โโโ WiredTigerLog.1
โ โโโ WiredTigerPreplog.0
โโโ mongod.lock # Lock file
โโโ storage.bson # Storage options
โโโ WiredTiger # WiredTiger metadata
Record Storage
Each document in MongoDB is stored as a record within a collection. WiredTiger uses a record format that includes:
// Conceptual record structure
struct Record {
RecordID id; // Unique identifier
uint32_t length; // Record size
uint32_t next; // Next record (for variable-length)
uint8_t flags; // Record flags
char data[]; // BSON document data
};
Padding and Storage Efficiency
WiredTiger allocates extra space for records to accommodate updates without requiring immediate relocation.
// Check storage statistics
db.products.stats()
// Output includes:
// {
// "storageSize": 16777216,
// "totalSize": 20971520,
// "avgObjSize": 256,
// "paddingFactor": 1.25, // 25% padding
// "numExtents": 3
// }
B-Tree Indexes
MongoDB uses B-Tree indexes as the primary index structure, providing efficient lookups, range queries, and sorting.
B-Tree Structure
[50]
/ \
[20,30] [70,90]
/ | \ / | \
[10] [25] [40] [60] [80] [100]
Index Data
// Create a compound index
db.products.createIndex({ category: 1, price: -1 })
// This creates a B-Tree where:
// - Primary key: category (ascending)
// - Secondary key: price (descending)
// Index structure (conceptual):
// {
// "Electronics": {
// 999: { _id: 1, data: ... },
// 499: { _id: 2, data: ... }
// },
// "Books": {
// 29: { _id: 3, data: ... },
// 15: { _id: 4, data: ... }
// }
// }
Index Types
// Single-field index
db.users.createIndex({ email: 1 })
// Compound index
db.products.createIndex({ category: 1, price: -1 })
// Multikey index (array fields)
db.products.createIndex({ tags: 1 })
// Unique index
db.users.createIndex({ email: 1 }, { unique: true })
// Sparse index
db.employees.createIndex({ ssn: 1 }, { sparse: true })
// Text index
db.articles.createIndex({ title: 'text', content: 'text' })
// Geospatial index
db.places.createIndex({ location: '2dsphere' })
// Hashed index (for sharding)
db.orders.createIndex({ order_id: 'hashed' })
Journaling and Durability
MongoDB uses write-ahead logging (journaling) to ensure durability of writes.
Journal Process
// Journal flow:
// 1. Application issues write
// 2. Write recorded in journal (WAL)
// 3. Write applied to cache
// 4. On checkpoint, data flushed to disk
// 5. Journal space recycled
Journal Configuration
# mongod.conf
storage:
journal:
enabled: true
commitIntervalMs: 100 # Commit every 100ms
Write Concern with Journaling
// Wait for journal (most durable)
db.orders.insertOne(
{ item: 'Laptop', quantity: 1 },
{ writeConcern: { j: true, wtimeout: 5000 } }
)
// Wait for majority acknowledgment with journal
db.orders.insertOne(
{ item: 'Mouse', quantity: 5 },
{ writeConcern: { w: 'majority', j: true } }
)
Recovery Process
On startup, MongoDB:
- Reads the last checkpoint
- Replays journal entries since checkpoint
- Recovers to consistent state
- Begins normal operations
Concurrency Control
WiredTiger uses optimistic concurrency control with document-level locking.
Lock Granularity
// Lock behavior:
// - Most operations use document-level locks
// - Some operations (index creation, compaction) use collection-level locks
// - Database-level operations use database locks
// - Cluster operations use cluster-level locks
// Check lock statistics
db.serverStatus().locks
// Output:
// {
// "Collection": {
// "acquireCount": { "r": 12345, "w": 6789, "W": 10 }
// }
// }
MVCC (Multi-Version Concurrency Control)
WiredTiger maintains multiple versions of documents to support concurrent operations:
// Concurrent read/write scenario:
// 1. Transaction A starts, reads document version 1
// 2. Transaction B starts, reads document version 1
// 3. Transaction A updates, creates version 2
// 4. Transaction B attempts update, fails due to conflict
// 5. Transaction B retries with new version
Transaction Support
MongoDB 4.0+ supports multi-document ACID transactions.
const session = client.startSession();
try {
await session.withTransaction(async () => {
// Both operations succeed or both fail
await accounts.updateOne(
{ _id: 'A' },
{ $inc: { balance: -100 } },
{ session }
);
await accounts.updateOne(
{ _id: 'B' },
{ $inc: { balance: 100 } },
{ session }
);
});
} finally {
await session.endSession();
}
Query Execution
Understanding how MongoDB executes queries helps optimize performance.
Query Planning
// Use explain to see query plan
db.products.find({ category: 'Electronics', price: { $lt: 500 } })
.sort({ price: 1 })
.explain('allPlansExecution')
// Output includes:
// - queryPlanner (chosen plan)
// - executionStats (timing details)
// - serverInfo (MongoDB version)
Query Stages
// Example execution stages:
// 1. COLLSCAN - Collection scan
// 2. IXSCAN - Index scan
// 3. FETCH - Retrieve documents
// 4. PROJECTION - Apply projection
// 5. SORT - Sort results
// 6. LIMIT - Apply limit
// 7. SKIP - Apply skip
Query Optimization
// Ensure queries use indexes
db.products.find({ category: 'Electronics' }).hint({ category: 1 })
// Cover queries (index-only)
db.products.find(
{ category: 'Electronics' },
{ _id: 0, name: 1, price: 1 }
).explain('executionStats')
// Output:
// {
// "executionStats": {
// "totalDocsExamined": 0, // No documents fetched!
// "indexName": "category_1_price_-1"
// }
// }
Compression
WiredTiger supports multiple compression algorithms.
Compression Types
// WiredTiger compression options:
// - Snappy (default, fast compression)
// - Zstandard (better compression ratio, MongoDB 4.2+)
// - zlib (best compression, highest CPU)
// - No compression
// Configure at collection level
db.createCollection('products', {
storageEngine: {
wiredTiger: {
configString: 'block_compressor=snappy'
}
}
})
Compression Stats
// Check compression statistics
db.products.stats()
// Output includes:
// {
// "avgObjSize": 256,
// "storageSize": 1048576,
// "compressionRatio": 0.65 // 65% of original size
// }
Checkpointing
Checkpoints create consistent snapshots of the database.
Checkpoint Process
// Checkpoint behavior:
// 1. Occur every 60 seconds (default)
// 2. Create consistent snapshot
// 3. Copy checkpoint to data files
// 4. Truncate journal after successful checkpoint
// Check checkpoint status
db.serverStatus().wiredTiger.checkpoint
// Output:
// {
// "Application time checkpoint": 1700000000,
// "Checkpoint generated at": 1700000060,
// "Maximum pages referenced": 1000,
// "Maximum pages skipped": 0,
// "Pages written by checkpoint": 100
// }
Memory Management
WiredTiger manages memory through its cache system.
Cache Behavior
// Cache eviction:
// 1. When cache reaches 80% full
// 2. Evict clean pages first
// 3. Then evict dirty pages (after checkpoint)
// Monitor cache activity
db.serverStatus().wiredTiger.cache
// Key metrics:
// - bytes currently in the cache
// - tracked dirty bytes in the cache
// - eviction calls
// - pages read into cache
Tuning Cache Size
# For systems with large RAM
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 16 # Increase for large datasets
journalCompressor: zstd
Replication Internals
Understanding replica set mechanics.
Oplog
The oplog (operations log) is a special capped collection that records all operations.
// Check oplog size
db.getSiblingDB('local').oplog.rs.find().sort({ ts: -1 }).limit(1)
// Oplog entry structure:
// {
// "ts": Timestamp(1700000000, 1),
// "t": 123,
// "h": 1234567890123456789,
// "v": 2,
// "op": "i", // i=insert, u=update, d=delete
// "ns": "myapp.orders",
// "o": { ... } // Document operation
// }
Replication Process
// Replication flow:
// 1. Primary receives write operation
// 2. Operation written to primary's oplog
// 3. Secondaries poll primary for new oplog entries
// 4. Secondaries apply operations locally
// 5. Secondary acknowledges to primary
Heartbeat and Election
// Heartbeat interval: 2 seconds (default)
// Election timeout: 10 seconds
// Priority affects election likelihood
// Configure member priority
cfg = rs.conf()
cfg.members[0].priority = 2 // More likely to become primary
cfg.members[1].priority = 1
cfg.members[2].priority = 0 // Cannot become primary
rs.reconfig(cfg)
External Resources
Conclusion
Understanding MongoDB’s internals helps you make informed decisions about schema design, indexing, and performance optimization. The WiredTiger storage engine provides document-level concurrency, compression, and robust durability through journaling.
Key takeaways:
- B-Tree indexes provide efficient lookups and range queries
- Journaling ensures durability through write-ahead logging
- MVCC enables concurrent operations without blocking
- Checkpoints create consistent recovery points
- Memory caching significantly impacts performance
This knowledge forms the foundation for advanced MongoDB administration and optimization.
In the next article, we will explore MongoDB’s developments in 2025-2026, including new features and cloud capabilities.
Comments