Skip to main content
⚡ Calmops

MongoDB Performance: Hardware, Indexes, and WiredTiger Storage Engine

Introduction

MongoDB performance depends on three main factors: hardware configuration, index design, and query patterns. Understanding how MongoDB uses RAM, CPU, and storage — and how the WiredTiger storage engine works — is the foundation for building fast, scalable MongoDB applications.

This guide covers the key concepts from MongoDB University’s M201 Performance course.

Hardware Considerations

RAM: The Most Critical Resource

MongoDB’s WiredTiger storage engine uses RAM extensively. The more data that fits in RAM, the fewer disk reads are needed:

Operations that rely heavily on RAM:

  • Aggregation pipeline stages (especially $sort, $group)
  • Index traversal (indexes should fit in RAM)
  • Write operations (write buffer before flushing to disk)
  • Query engine (working set)
  • Active connections

Rule of thumb: Your working set (frequently accessed data + indexes) should fit in RAM. If it doesn’t, MongoDB will constantly page data in and out of disk — a major performance killer.

// Check current memory usage
db.serverStatus().mem
// {
//   bits: 64,
//   resident: 1024,    // MB of RAM currently used
//   virtual: 2048,     // MB of virtual memory
//   supported: true
// }

// Check WiredTiger cache usage
db.serverStatus().wiredTiger.cache

CPU: Parallel Processing

Operations that rely on CPU:

  • Page compression/decompression (WiredTiger compresses data by default)
  • Data calculations in aggregation pipelines
  • Map-Reduce operations
  • Index builds
  • Encryption/decryption

WiredTiger uses a non-blocking concurrency model — multiple operations can run simultaneously. More CPU cores = more parallel operations.

// Check CPU usage
db.serverStatus().opcounters
// {
//   insert: 1000,
//   query: 50000,
//   update: 5000,
//   delete: 100,
//   getmore: 200,
//   command: 10000
// }

Storage: RAID and Disk Configuration

RAID recommendations:

  • RAID 10 (striped mirrors): Best for MongoDB — combines performance of RAID 0 with redundancy of RAID 1
  • RAID 5/6: Not recommended — write penalty is significant for MongoDB’s write patterns
  • NVMe SSD: Dramatically better than spinning disks for random I/O

Avoid:

  • RAID 5 (write penalty)
  • Network-attached storage (NAS) with high latency
  • Spinning disks for production workloads

WiredTiger Storage Engine

WiredTiger is MongoDB’s default storage engine since MongoDB 3.2. Key characteristics:

Document-Level Concurrency

WiredTiger uses document-level locking (not collection-level or database-level). Multiple operations can modify different documents in the same collection simultaneously:

MongoDB 2.x (MMAPv1): Database-level lock
MongoDB 3.x+ (WiredTiger): Document-level lock

Result: Much better write concurrency

Compression

WiredTiger compresses data by default, reducing storage requirements by 60-80%:

// Default compression settings
{
  blockCompressor: "snappy",  // fast compression for data
  journalCompressor: "snappy",
  indexPrefixCompression: true
}

// Change to zlib for better compression ratio (slower)
db.createCollection("logs", {
  storageEngine: {
    wiredTiger: {
      configString: "block_compressor=zlib"
    }
  }
})

Cache Management

WiredTiger maintains an internal cache (default: 50% of RAM - 1GB, minimum 256MB):

// Check cache hit ratio
const stats = db.serverStatus().wiredTiger.cache;
const hitRatio = stats["pages read into cache"] /
                 (stats["pages read into cache"] + stats["pages requested from the cache"]);
console.log(`Cache hit ratio: ${(hitRatio * 100).toFixed(1)}%`);
// Aim for > 95%

Indexes in MongoDB

Indexes are the primary tool for query optimization. Without an index, MongoDB performs a collection scan — reading every document.

Creating Indexes

// Single field index
db.users.createIndex({ email: 1 })  // 1 = ascending, -1 = descending

// Compound index
db.orders.createIndex({ user_id: 1, created_at: -1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// Sparse index (only indexes documents where field exists)
db.users.createIndex({ phone: 1 }, { sparse: true })

// Partial index (only indexes documents matching a filter)
db.orders.createIndex(
  { user_id: 1 },
  { partialFilterExpression: { status: "active" } }
)

// TTL index (auto-delete documents after N seconds)
db.sessions.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 3600 }  // delete after 1 hour
)

// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" })

Explain Plans

Use explain() to understand how MongoDB executes a query:

// Basic explain
db.orders.find({ user_id: 42 }).explain()

// With execution stats
db.orders.find({ user_id: 42 }).explain("executionStats")

// Key fields to check:
// winningPlan.stage: "COLLSCAN" (bad) vs "IXSCAN" (good)
// executionStats.totalDocsExamined: should be close to nReturned
// executionStats.executionTimeMillis: query time

// Example output
{
  "winningPlan": {
    "stage": "FETCH",
    "inputStage": {
      "stage": "IXSCAN",           // using index
      "indexName": "user_id_1"
    }
  },
  "executionStats": {
    "nReturned": 10,
    "totalDocsExamined": 10,       // examined == returned = efficient!
    "executionTimeMillis": 1
  }
}

Index Selectivity

High-selectivity indexes (many distinct values) are more effective:

// High selectivity — good index candidate
db.users.createIndex({ email: 1 })  // every email is unique

// Low selectivity — poor index candidate
db.orders.createIndex({ status: 1 })  // only 3-4 distinct values

// For low-selectivity fields, use compound indexes
db.orders.createIndex({ status: 1, created_at: -1 })
// Now status filters, created_at provides ordering

CRUD Optimization

Query Optimization

// Use projection to return only needed fields
db.users.find({ active: true }, { name: 1, email: 1, _id: 0 })

// Use limit() to avoid fetching unnecessary documents
db.logs.find({ level: "error" }).sort({ timestamp: -1 }).limit(100)

// Avoid $where (executes JavaScript, can't use indexes)
// Bad:
db.users.find({ $where: "this.age > 18" })
// Good:
db.users.find({ age: { $gt: 18 } })

// Use $exists carefully — it can't use sparse indexes efficiently
// Consider using null checks instead
db.users.find({ phone: { $ne: null } })

Write Optimization

// Bulk writes are much faster than individual writes
const bulk = db.products.initializeUnorderedBulkOp();
for (let i = 0; i < 10000; i++) {
  bulk.insert({ sku: `SKU-${i}`, price: Math.random() * 100 });
}
bulk.execute();

// Use ordered: false for better parallelism when order doesn't matter
db.products.bulkWrite(operations, { ordered: false })

// Write concern trade-offs
// w: 1 (default) — fast, acknowledged by primary
// w: "majority" — slower, safer, acknowledged by majority of replica set
db.orders.insertOne(order, { w: "majority", wtimeoutMS: 5000 })

Performance on Clusters

Replica Sets

// Read from secondaries to distribute read load
db.getMongo().setReadPref("secondaryPreferred")

// Or per-query
db.users.find({ active: true }).readPref("secondary")

// Read concern levels
db.orders.find().readConcern("majority")  // reads committed data

Sharding

// Enable sharding on a database
sh.enableSharding("myapp")

// Shard a collection on a field
sh.shardCollection("myapp.orders", { user_id: "hashed" })

// Check shard distribution
db.orders.getShardDistribution()

Setting Up the Lab Environment

# Import sample data for practice
mongoimport \
  --port 27017 \
  --db m201 \
  --drop \
  --collection people \
  people.json

mongoimport \
  --port 27017 \
  --db m201 \
  --drop \
  --collection restaurants \
  restaurants.json
// Verify data loaded
use m201
db.people.count({ "email": { "$exists": 1 } })
db.restaurants.count()

Performance Checklist

  • Working set (data + indexes) fits in RAM
  • All query fields are indexed
  • Compound indexes match query patterns (leftmost prefix rule)
  • explain("executionStats") shows IXSCAN, not COLLSCAN
  • totalDocsExaminednReturned (no over-scanning)
  • Bulk writes used for large data loads
  • Write concern matches durability requirements
  • TTL indexes used for time-expiring data
  • RAID 10 or NVMe SSD for storage

Resources

Comments