Skip to main content
โšก Calmops

MongoDB Resiliency: Connection Pooling, Error Handling, and Robust Configuration

Introduction

A MongoDB application that works in development can fail in production in unexpected ways โ€” connection exhaustion, network timeouts, replica set failovers, and write failures. This guide covers the patterns and configuration options that make MongoDB applications robust in production.

Connection Pooling

Creating a new database connection for every request is expensive โ€” it involves a TCP handshake, authentication, and TLS negotiation. Connection pooling solves this by maintaining a pool of pre-established connections that requests can reuse.

How It Works

Application                    MongoDB
    โ”‚                              โ”‚
    โ”‚  [Pool: 5 idle connections]  โ”‚
    โ”‚                              โ”‚
Request 1 โ”€โ”€โ†’ [borrow connection] โ”€โ”€โ†’ query โ”€โ”€โ†’ [return connection]
Request 2 โ”€โ”€โ†’ [borrow connection] โ”€โ”€โ†’ query โ”€โ”€โ†’ [return connection]
Request 3 โ”€โ”€โ†’ [borrow connection] โ”€โ”€โ†’ query โ”€โ”€โ†’ [return connection]

Benefits:

  • Subsequent requests are served immediately (no connection overhead)
  • Handles traffic spikes by queuing requests when pool is full
  • Reduces load on MongoDB server

Configuring the Pool

const { MongoClient } = require('mongodb');

const client = new MongoClient(process.env.MONGODB_URI, {
  // Maximum connections in the pool (default: 100)
  maxPoolSize: 100,

  // Minimum connections to keep open (default: 0)
  minPoolSize: 5,

  // How long a connection can be idle before being closed (ms)
  maxIdleTimeMS: 60000,

  // How long to wait for a connection from the pool (ms)
  waitQueueTimeoutMS: 5000,
});

Pool Size Guidelines

Application Type Recommended maxPoolSize
Small API (< 100 req/s) 10-20
Medium API (100-1000 req/s) 50-100
High-traffic API (> 1000 req/s) 100-200
Background workers 5-10

Note: More connections isn’t always better. Each connection uses memory on the MongoDB server. The default of 100 is appropriate for most applications.

Robust Client Configuration

Always Use Connection Pooling

// BAD: creating a new client per request
app.get('/movies', async (req, res) => {
  const client = new MongoClient(uri);  // new connection every request!
  await client.connect();
  // ...
  await client.close();
});

// GOOD: reuse a single client (connection pool)
const client = new MongoClient(uri, { maxPoolSize: 50 });
await client.connect();  // once at startup

app.get('/movies', async (req, res) => {
  const db = client.db('mflix');  // reuses pool
  // ...
});

Always Specify wtimeout with Majority Writes

Write concern controls how many replica set members must acknowledge a write before it’s considered successful. w: 'majority' is the safest option โ€” but without a timeout, your application can hang indefinitely if the majority can’t be reached:

// BAD: no timeout โ€” can hang forever
await collection.insertOne(doc, { w: 'majority' });

// GOOD: timeout after 2.5 seconds
await collection.insertOne(doc, {
  w: 'majority',
  wtimeoutMS: 2500
});

When to use w: 'majority': Any write where data loss is unacceptable โ€” user accounts, financial transactions, orders.

When w: 1 is acceptable: Logging, analytics, caches โ€” where occasional data loss is tolerable.

Configure serverSelectionTimeout

serverSelectionTimeoutMS controls how long the driver waits to find an available server before throwing an error:

const client = new MongoClient(uri, {
  // Default is 30 seconds โ€” too long for most web apps
  serverSelectionTimeoutMS: 5000,  // fail fast: 5 seconds

  // How long to wait for a socket operation
  socketTimeoutMS: 45000,

  // How long to wait for initial connection
  connectTimeoutMS: 10000,
});

Error Handling

Handle Connection Errors at Startup

async function startServer() {
  try {
    await client.connect();
    console.log('Connected to MongoDB');

    app.listen(PORT, () => console.log(`Server on port ${PORT}`));
  } catch (err) {
    console.error('Failed to connect to MongoDB:', err.message);
    process.exit(1);  // don't start the server if DB is unavailable
  }
}

Handle Errors in Route Handlers

app.get('/movies/:id', async (req, res) => {
  try {
    const movie = await movies.findOne({ _id: new ObjectId(req.params.id) });

    if (!movie) {
      return res.status(404).json({ error: 'Movie not found' });
    }

    res.json(movie);
  } catch (err) {
    if (err.name === 'BSONTypeError') {
      return res.status(400).json({ error: 'Invalid ID format' });
    }
    console.error('Database error:', err);
    res.status(500).json({ error: 'Internal server error' });
  }
});

Handle Write Errors

const { MongoServerError } = require('mongodb');

async function createUser(userData) {
  try {
    const result = await users.insertOne(userData);
    return result.insertedId;
  } catch (err) {
    if (err instanceof MongoServerError && err.code === 11000) {
      // Duplicate key error (e.g., unique email constraint)
      throw new Error('Email already exists');
    }
    throw err;  // re-throw unexpected errors
  }
}

Common MongoDB Error Codes

Code Meaning Handling
11000 Duplicate key Return 409 Conflict
121 Document validation failed Return 400 Bad Request
50 Operation exceeded time limit Retry or return 503
13 Unauthorized Return 401
26 Namespace not found Check collection name

Retry Logic for Transient Errors

async function withRetry(operation, maxRetries = 3, delayMs = 1000) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await operation();
    } catch (err) {
      const isTransient = err.hasErrorLabel?.('TransientTransactionError') ||
                          err.hasErrorLabel?.('UnknownTransactionCommitResult');

      if (!isTransient || attempt === maxRetries) {
        throw err;
      }

      console.warn(`Attempt ${attempt} failed, retrying in ${delayMs}ms...`);
      await new Promise(resolve => setTimeout(resolve, delayMs * attempt));
    }
  }
}

// Usage
const result = await withRetry(() =>
  collection.insertOne(document, { w: 'majority', wtimeoutMS: 2500 })
);

Principle of Least Privilege

Database users should have only the permissions they need โ€” nothing more.

Create Role-Specific Users

// In MongoDB shell or Atlas UI:

// Read-only user for reporting
db.createUser({
  user: "reporter",
  pwd: "...",
  roles: [{ role: "read", db: "mflix" }]
});

// Read-write user for the application
db.createUser({
  user: "app_user",
  pwd: "...",
  roles: [{ role: "readWrite", db: "mflix" }]
});

// Admin user for migrations only
db.createUser({
  user: "migrator",
  pwd: "...",
  roles: [{ role: "dbAdmin", db: "mflix" }]
});

Use Different Credentials per Environment

# .env.development
MONGODB_URI=mongodb://dev_user:dev_pass@localhost:27017/mflix_dev

# .env.production
MONGODB_URI=mongodb+srv://app_user:[email protected]/mflix

Monitoring and Health Checks

// Health check endpoint
app.get('/health', async (req, res) => {
  try {
    // Ping the database
    await client.db('admin').command({ ping: 1 });
    res.json({
      status: 'healthy',
      database: 'connected',
      timestamp: new Date().toISOString()
    });
  } catch (err) {
    res.status(503).json({
      status: 'unhealthy',
      database: 'disconnected',
      error: err.message
    });
  }
});

// Monitor connection pool events
client.on('connectionPoolCreated', (event) => {
  console.log('Connection pool created:', event.address);
});

client.on('connectionCheckOutFailed', (event) => {
  console.warn('Connection checkout failed:', event.reason);
});

Production Checklist

  • Single MongoClient instance reused across requests
  • maxPoolSize configured for expected load
  • serverSelectionTimeoutMS set (5-10 seconds for web apps)
  • wtimeoutMS set for all majority writes
  • Error handling in every database operation
  • Duplicate key errors handled gracefully (return 409, not 500)
  • Separate database users per role (app, reporting, admin)
  • Connection string in environment variables, not source code
  • Health check endpoint that pings the database
  • Graceful shutdown that closes the connection pool

Resources

Comments