MongoDB Performance: Hardware, Indexes, and WiredTiger Storage Engine

Introduction

MongoDB performance depends on three main factors: hardware configuration, index design, and query patterns. Understanding how MongoDB uses RAM, CPU, and storage — and how the WiredTiger storage engine works — is the foundation for building fast, scalable MongoDB applications.

This guide covers the key concepts from MongoDB University’s M201 Performance course.

Hardware Considerations

RAM: The Most Critical Resource

MongoDB’s WiredTiger storage engine uses RAM extensively. The more data that fits in RAM, the fewer disk reads are needed:

Operations that rely heavily on RAM:

Aggregation pipeline stages (especially $sort, $group)
Index traversal (indexes should fit in RAM)
Write operations (write buffer before flushing to disk)
Query engine (working set)
Active connections

Rule of thumb: Your working set (frequently accessed data + indexes) should fit in RAM. If it doesn’t, MongoDB will constantly page data in and out of disk — a major performance killer.

// Check current memory usage
db.serverStatus().mem
// {
//   bits: 64,
//   resident: 1024,    // MB of RAM currently used
//   virtual: 2048,     // MB of virtual memory
//   supported: true
// }

// Check WiredTiger cache usage
db.serverStatus().wiredTiger.cache

CPU: Parallel Processing

Operations that rely on CPU:

Page compression/decompression (WiredTiger compresses data by default)
Data calculations in aggregation pipelines
Map-Reduce operations
Index builds
Encryption/decryption

WiredTiger uses a non-blocking concurrency model — multiple operations can run simultaneously. More CPU cores = more parallel operations.

// Check CPU usage
db.serverStatus().opcounters
// {
//   insert: 1000,
//   query: 50000,
//   update: 5000,
//   delete: 100,
//   getmore: 200,
//   command: 10000
// }

Storage: RAID and Disk Configuration

RAID recommendations:

RAID 10 (striped mirrors): Best for MongoDB — combines performance of RAID 0 with redundancy of RAID 1
RAID 5/6: Not recommended — write penalty is significant for MongoDB’s write patterns
NVMe SSD: Dramatically better than spinning disks for random I/O

Avoid:

RAID 5 (write penalty)
Network-attached storage (NAS) with high latency
Spinning disks for production workloads

WiredTiger Storage Engine

WiredTiger is MongoDB’s default storage engine since MongoDB 3.2. Key characteristics:

Document-Level Concurrency

WiredTiger uses document-level locking (not collection-level or database-level). Multiple operations can modify different documents in the same collection simultaneously:

MongoDB 2.x (MMAPv1): Database-level lock
MongoDB 3.x+ (WiredTiger): Document-level lock

Result: Much better write concurrency

Compression

WiredTiger compresses data by default, reducing storage requirements by 60-80%:

// Default compression settings
{
  blockCompressor: "snappy",  // fast compression for data
  journalCompressor: "snappy",
  indexPrefixCompression: true
}

// Change to zlib for better compression ratio (slower)
db.createCollection("logs", {
  storageEngine: {
    wiredTiger: {
      configString: "block_compressor=zlib"
    }
  }
})

Cache Management

WiredTiger maintains an internal cache (default: 50% of RAM - 1GB, minimum 256MB):

// Check cache hit ratio
const stats = db.serverStatus().wiredTiger.cache;
const hitRatio = stats["pages read into cache"] /
                 (stats["pages read into cache"] + stats["pages requested from the cache"]);
console.log(`Cache hit ratio: ${(hitRatio * 100).toFixed(1)}%`);
// Aim for > 95%

Indexes in MongoDB

Indexes are the primary tool for query optimization. Without an index, MongoDB performs a collection scan — reading every document.

Creating Indexes

// Single field index
db.users.createIndex({ email: 1 })  // 1 = ascending, -1 = descending

// Compound index
db.orders.createIndex({ user_id: 1, created_at: -1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// Sparse index (only indexes documents where field exists)
db.users.createIndex({ phone: 1 }, { sparse: true })

// Partial index (only indexes documents matching a filter)
db.orders.createIndex(
  { user_id: 1 },
  { partialFilterExpression: { status: "active" } }
)

// TTL index (auto-delete documents after N seconds)
db.sessions.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 3600 }  // delete after 1 hour
)

// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" })

Explain Plans

Use explain() to understand how MongoDB executes a query:

// Basic explain
db.orders.find({ user_id: 42 }).explain()

// With execution stats
db.orders.find({ user_id: 42 }).explain("executionStats")

// Key fields to check:
// winningPlan.stage: "COLLSCAN" (bad) vs "IXSCAN" (good)
// executionStats.totalDocsExamined: should be close to nReturned
// executionStats.executionTimeMillis: query time

// Example output
{
  "winningPlan": {
    "stage": "FETCH",
    "inputStage": {
      "stage": "IXSCAN",           // using index
      "indexName": "user_id_1"
    }
  },
  "executionStats": {
    "nReturned": 10,
    "totalDocsExamined": 10,       // examined == returned = efficient!
    "executionTimeMillis": 1
  }
}

Index Selectivity

High-selectivity indexes (many distinct values) are more effective:

// High selectivity — good index candidate
db.users.createIndex({ email: 1 })  // every email is unique

// Low selectivity — poor index candidate
db.orders.createIndex({ status: 1 })  // only 3-4 distinct values

// For low-selectivity fields, use compound indexes
db.orders.createIndex({ status: 1, created_at: -1 })
// Now status filters, created_at provides ordering

CRUD Optimization

Query Optimization

// Use projection to return only needed fields
db.users.find({ active: true }, { name: 1, email: 1, _id: 0 })

// Use limit() to avoid fetching unnecessary documents
db.logs.find({ level: "error" }).sort({ timestamp: -1 }).limit(100)

// Avoid $where (executes JavaScript, can't use indexes)
// Bad:
db.users.find({ $where: "this.age > 18" })
// Good:
db.users.find({ age: { $gt: 18 } })

// Use $exists carefully — it can't use sparse indexes efficiently
// Consider using null checks instead
db.users.find({ phone: { $ne: null } })

Write Optimization

// Bulk writes are much faster than individual writes
const bulk = db.products.initializeUnorderedBulkOp();
for (let i = 0; i < 10000; i++) {
  bulk.insert({ sku: `SKU-${i}`, price: Math.random() * 100 });
}
bulk.execute();

// Use ordered: false for better parallelism when order doesn't matter
db.products.bulkWrite(operations, { ordered: false })

// Write concern trade-offs
// w: 1 (default) — fast, acknowledged by primary
// w: "majority" — slower, safer, acknowledged by majority of replica set
db.orders.insertOne(order, { w: "majority", wtimeoutMS: 5000 })

Performance on Clusters

Replica Sets

// Read from secondaries to distribute read load
db.getMongo().setReadPref("secondaryPreferred")

// Or per-query
db.users.find({ active: true }).readPref("secondary")

// Read concern levels
db.orders.find().readConcern("majority")  // reads committed data

Sharding

// Enable sharding on a database
sh.enableSharding("myapp")

// Shard a collection on a field
sh.shardCollection("myapp.orders", { user_id: "hashed" })

// Check shard distribution
db.orders.getShardDistribution()

Setting Up the Lab Environment

# Import sample data for practice
mongoimport \
  --port 27017 \
  --db m201 \
  --drop \
  --collection people \
  people.json

mongoimport \
  --port 27017 \
  --db m201 \
  --drop \
  --collection restaurants \
  restaurants.json

// Verify data loaded
use m201
db.people.count({ "email": { "$exists": 1 } })
db.restaurants.count()

Performance Checklist

Working set (data + indexes) fits in RAM
All query fields are indexed
Compound indexes match query patterns (leftmost prefix rule)
explain("executionStats") shows IXSCAN, not COLLSCAN
totalDocsExamined ≈ nReturned (no over-scanning)
Bulk writes used for large data loads
Write concern matches durability requirements
TTL indexes used for time-expiring data
RAID 10 or NVMe SSD for storage

MongoDB Performance: Hardware, Indexes, and WiredTiger Storage Engine

Introduction

Hardware Considerations

RAM: The Most Critical Resource

CPU: Parallel Processing

Storage: RAID and Disk Configuration

WiredTiger Storage Engine

Document-Level Concurrency

Compression

Cache Management

Indexes in MongoDB

Creating Indexes

Explain Plans

Index Selectivity

CRUD Optimization

Query Optimization

Write Optimization

Performance on Clusters

Replica Sets

Sharding

Setting Up the Lab Environment

Performance Checklist

Resources

Comments

Share this article

👍 Was this article helpful?