Introduction
MongoDB performance depends on three main factors: hardware configuration, index design, and query patterns. Understanding how MongoDB uses RAM, CPU, and storage — and how the WiredTiger storage engine works — is the foundation for building fast, scalable MongoDB applications.
This guide covers the key concepts from MongoDB University’s M201 Performance course.
Hardware Considerations
RAM: The Most Critical Resource
MongoDB’s WiredTiger storage engine uses RAM extensively. The more data that fits in RAM, the fewer disk reads are needed:
Operations that rely heavily on RAM:
- Aggregation pipeline stages (especially
$sort,$group) - Index traversal (indexes should fit in RAM)
- Write operations (write buffer before flushing to disk)
- Query engine (working set)
- Active connections
Rule of thumb: Your working set (frequently accessed data + indexes) should fit in RAM. If it doesn’t, MongoDB will constantly page data in and out of disk — a major performance killer.
// Check current memory usage
db.serverStatus().mem
// {
// bits: 64,
// resident: 1024, // MB of RAM currently used
// virtual: 2048, // MB of virtual memory
// supported: true
// }
// Check WiredTiger cache usage
db.serverStatus().wiredTiger.cache
CPU: Parallel Processing
Operations that rely on CPU:
- Page compression/decompression (WiredTiger compresses data by default)
- Data calculations in aggregation pipelines
- Map-Reduce operations
- Index builds
- Encryption/decryption
WiredTiger uses a non-blocking concurrency model — multiple operations can run simultaneously. More CPU cores = more parallel operations.
// Check CPU usage
db.serverStatus().opcounters
// {
// insert: 1000,
// query: 50000,
// update: 5000,
// delete: 100,
// getmore: 200,
// command: 10000
// }
Storage: RAID and Disk Configuration
RAID recommendations:
- RAID 10 (striped mirrors): Best for MongoDB — combines performance of RAID 0 with redundancy of RAID 1
- RAID 5/6: Not recommended — write penalty is significant for MongoDB’s write patterns
- NVMe SSD: Dramatically better than spinning disks for random I/O
Avoid:
- RAID 5 (write penalty)
- Network-attached storage (NAS) with high latency
- Spinning disks for production workloads
WiredTiger Storage Engine
WiredTiger is MongoDB’s default storage engine since MongoDB 3.2. Key characteristics:
Document-Level Concurrency
WiredTiger uses document-level locking (not collection-level or database-level). Multiple operations can modify different documents in the same collection simultaneously:
MongoDB 2.x (MMAPv1): Database-level lock
MongoDB 3.x+ (WiredTiger): Document-level lock
Result: Much better write concurrency
Compression
WiredTiger compresses data by default, reducing storage requirements by 60-80%:
// Default compression settings
{
blockCompressor: "snappy", // fast compression for data
journalCompressor: "snappy",
indexPrefixCompression: true
}
// Change to zlib for better compression ratio (slower)
db.createCollection("logs", {
storageEngine: {
wiredTiger: {
configString: "block_compressor=zlib"
}
}
})
Cache Management
WiredTiger maintains an internal cache (default: 50% of RAM - 1GB, minimum 256MB):
// Check cache hit ratio
const stats = db.serverStatus().wiredTiger.cache;
const hitRatio = stats["pages read into cache"] /
(stats["pages read into cache"] + stats["pages requested from the cache"]);
console.log(`Cache hit ratio: ${(hitRatio * 100).toFixed(1)}%`);
// Aim for > 95%
Indexes in MongoDB
Indexes are the primary tool for query optimization. Without an index, MongoDB performs a collection scan — reading every document.
Creating Indexes
// Single field index
db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending
// Compound index
db.orders.createIndex({ user_id: 1, created_at: -1 })
// Unique index
db.users.createIndex({ email: 1 }, { unique: true })
// Sparse index (only indexes documents where field exists)
db.users.createIndex({ phone: 1 }, { sparse: true })
// Partial index (only indexes documents matching a filter)
db.orders.createIndex(
{ user_id: 1 },
{ partialFilterExpression: { status: "active" } }
)
// TTL index (auto-delete documents after N seconds)
db.sessions.createIndex(
{ created_at: 1 },
{ expireAfterSeconds: 3600 } // delete after 1 hour
)
// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" })
Explain Plans
Use explain() to understand how MongoDB executes a query:
// Basic explain
db.orders.find({ user_id: 42 }).explain()
// With execution stats
db.orders.find({ user_id: 42 }).explain("executionStats")
// Key fields to check:
// winningPlan.stage: "COLLSCAN" (bad) vs "IXSCAN" (good)
// executionStats.totalDocsExamined: should be close to nReturned
// executionStats.executionTimeMillis: query time
// Example output
{
"winningPlan": {
"stage": "FETCH",
"inputStage": {
"stage": "IXSCAN", // using index
"indexName": "user_id_1"
}
},
"executionStats": {
"nReturned": 10,
"totalDocsExamined": 10, // examined == returned = efficient!
"executionTimeMillis": 1
}
}
Index Selectivity
High-selectivity indexes (many distinct values) are more effective:
// High selectivity — good index candidate
db.users.createIndex({ email: 1 }) // every email is unique
// Low selectivity — poor index candidate
db.orders.createIndex({ status: 1 }) // only 3-4 distinct values
// For low-selectivity fields, use compound indexes
db.orders.createIndex({ status: 1, created_at: -1 })
// Now status filters, created_at provides ordering
CRUD Optimization
Query Optimization
// Use projection to return only needed fields
db.users.find({ active: true }, { name: 1, email: 1, _id: 0 })
// Use limit() to avoid fetching unnecessary documents
db.logs.find({ level: "error" }).sort({ timestamp: -1 }).limit(100)
// Avoid $where (executes JavaScript, can't use indexes)
// Bad:
db.users.find({ $where: "this.age > 18" })
// Good:
db.users.find({ age: { $gt: 18 } })
// Use $exists carefully — it can't use sparse indexes efficiently
// Consider using null checks instead
db.users.find({ phone: { $ne: null } })
Write Optimization
// Bulk writes are much faster than individual writes
const bulk = db.products.initializeUnorderedBulkOp();
for (let i = 0; i < 10000; i++) {
bulk.insert({ sku: `SKU-${i}`, price: Math.random() * 100 });
}
bulk.execute();
// Use ordered: false for better parallelism when order doesn't matter
db.products.bulkWrite(operations, { ordered: false })
// Write concern trade-offs
// w: 1 (default) — fast, acknowledged by primary
// w: "majority" — slower, safer, acknowledged by majority of replica set
db.orders.insertOne(order, { w: "majority", wtimeoutMS: 5000 })
Performance on Clusters
Replica Sets
// Read from secondaries to distribute read load
db.getMongo().setReadPref("secondaryPreferred")
// Or per-query
db.users.find({ active: true }).readPref("secondary")
// Read concern levels
db.orders.find().readConcern("majority") // reads committed data
Sharding
// Enable sharding on a database
sh.enableSharding("myapp")
// Shard a collection on a field
sh.shardCollection("myapp.orders", { user_id: "hashed" })
// Check shard distribution
db.orders.getShardDistribution()
Setting Up the Lab Environment
# Import sample data for practice
mongoimport \
--port 27017 \
--db m201 \
--drop \
--collection people \
people.json
mongoimport \
--port 27017 \
--db m201 \
--drop \
--collection restaurants \
restaurants.json
// Verify data loaded
use m201
db.people.count({ "email": { "$exists": 1 } })
db.restaurants.count()
Performance Checklist
- Working set (data + indexes) fits in RAM
- All query fields are indexed
- Compound indexes match query patterns (leftmost prefix rule)
-
explain("executionStats")shows IXSCAN, not COLLSCAN -
totalDocsExamined≈nReturned(no over-scanning) - Bulk writes used for large data loads
- Write concern matches durability requirements
- TTL indexes used for time-expiring data
- RAID 10 or NVMe SSD for storage
Resources
- MongoDB University M201: MongoDB Performance
- MongoDB Indexing Strategies
- WiredTiger Storage Engine
- MongoDB explain() Output
Comments