Database Setup & Deployment: PostgreSQL & MongoDB Hosting Strategies & Zero-Downtime Migrations

Choosing where and how to deploy your database is one of the most critical infrastructure decisions you’ll make. Whether you’re launching a new product or migrating from legacy systems, the wrong choice can result in downtime, data loss, or prohibitive costs. This comprehensive guide covers hosting options, deployment strategies, and zero-downtime migration techniques for both PostgreSQL and MongoDB.

Hosting Options Overview
PostgreSQL Hosting Strategies
MongoDB Hosting Strategies
Key Decision Factors
Database Migration Strategies
Zero-Downtime Migrations
Backup & Disaster Recovery
Monitoring & Performance Optimization
Common Pitfalls & How to Avoid Them
Tools & Frameworks for Migrations
Conclusion

Hosting Options Overview

When deploying a database, you have three primary models: self-hosted, managed services, and serverless/hybrid. Each model trades off control for operational burden.

Self-Hosted

What it is: You provision servers (VMs, bare metal) and manage the entire database stack (installation, patching, backups, replication).

Pros:

Maximum control over configuration and optimization
Potentially lower long-term costs at scale (after initial setup)
No vendor lock-in
Full visibility into system internals

Cons:

High operational overhead (24/7 monitoring, patching, failover management)
Initial expertise required (database administration, Linux, networking)
Scaling requires additional provisioning and configuration
Disaster recovery is your responsibility

Best for: Large organizations with dedicated DBA teams, cost-sensitive projects at scale, workloads with unusual requirements

Managed Services (DBaaS)

What it is: Cloud providers (AWS RDS, Azure Database, Google Cloud SQL, Heroku Postgres, etc.) handle infrastructure, patching, and backups. You manage application connection and configurations.

Pros:

Automated backups and point-in-time recovery
Automated patching and version upgrades
Built-in high availability (multi-AZ, failover)
Monitoring and alerting included
Scaling with a few clicks or API calls

Cons:

Higher per-instance cost than self-hosted
Limited customization (e.g., kernel parameters)
Potential vendor lock-in
Network latency if provisioned in different regions

Best for: SaaS companies, startups, teams without DBA expertise, applications needing uptime guarantees (99.9%+)

Serverless & Hybrid Models

What it is: Databases that scale automatically without provisioning. Examples: AWS Aurora Serverless, MongoDB Atlas, Supabase, PlanetScale (MySQL), Vercel Postgres.

Pros:

Auto-scaling based on demand (only pay for what you use)
Near-zero operational overhead
Global distribution options
Quick to get started (minutes vs hours)

Cons:

Cold start latency for idle instances
Unpredictable pricing during traffic spikes
Less control over instance behavior
Sometimes higher per-request costs for low-traffic apps

Best for: Prototype/MVP projects, bursty workloads, globally-distributed applications, startups optimizing for time-to-market

PostgreSQL Hosting Strategies

Self-Hosted PostgreSQL

For organizations choosing self-hosted PostgreSQL, here’s a production-ready setup:

Architecture: Primary + Standby Replicas

┌─────────────────────────────────────────────┐
│        Application Layer (Load Balanced)    │
└─────────────────────────────────────────────┘
              ↓              ↓              ↓
┌─────────────────────────────────────────────┐
│    Connection Pooler (PgBouncer / pgcat)    │
└─────────────────────────────────────────────┘
              ↓ (writes)    ↓ (reads)
        ┌──────────────┐  ┌──────────────┐
        │ Primary (RW) │  │ Standby (RO) │
        │  PostgreSQL  │  │  PostgreSQL  │
        └──────────────┘  └──────────────┘
              ↓ (WAL shipping / replication)
        ┌──────────────┐
        │ Standby #2   │
        │ (Async Rep)  │
        └──────────────┘
              ↓
        ┌──────────────┐
        │ Backup Node  │
        │ (Scheduled)  │
        └──────────────┘

Basic Server Setup

# Install PostgreSQL on Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

# Start and enable PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Switch to postgres user and create database
sudo -u postgres psql
postgres=# CREATE DATABASE myapp;
postgres=# CREATE USER appuser WITH ENCRYPTED PASSWORD 'secure_password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE myapp TO appuser;
postgres=# \q

Enable Replication (Primary)

Edit /etc/postgresql/14/main/postgresql.conf:

# Replication settings
wal_level = replica                    # Enable WAL logging
max_wal_senders = 3                    # Max standby replicas
wal_keep_size = 2GB                    # Keep WAL segments
hot_standby = on                       # Allow queries on standby

Edit /etc/postgresql/14/main/pg_hba.conf to allow replication:

# Standby replication connection
host    replication     replication_user   10.0.1.10/32   md5
host    replication     replication_user   10.0.1.11/32   md5

Restart PostgreSQL:

sudo systemctl restart postgresql

Set Up Standby (Read Replica)

# On standby server, stop PostgreSQL if running
sudo systemctl stop postgresql

# Take base backup from primary
sudo -u postgres pg_basebackup -h 10.0.1.5 -D /var/lib/postgresql/14/main \
  -U replication_user -v -P -W -R

# Start standby in recovery mode
sudo systemctl start postgresql

Verify replication:

# On primary, check replication status
sudo -u postgres psql
postgres=# SELECT client_addr, state, sync_state FROM pg_stat_replication;
 client_addr  |   state   | sync_state
──────────────┼───────────┼────────────
 10.0.1.10    | streaming | async
 10.0.1.11    | streaming | async

Connection Pooling with PgBouncer

Install PgBouncer:

sudo apt-get install pgbouncer

Configure /etc/pgbouncer/pgbouncer.ini:

[databases]
myapp = host=10.0.1.5 port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction          # Connection pooling mode
max_client_conn = 1000           # Max client connections
default_pool_size = 25           # Connections per database
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3

; Logging
log_connections = 1
log_disconnections = 1
log_pooler_stats = 1

Start PgBouncer:

sudo systemctl start pgbouncer
sudo systemctl enable pgbouncer

Applications now connect to PgBouncer (port 6432) instead of PostgreSQL directly, gaining connection efficiency.

Managed PostgreSQL (AWS RDS / Azure Database / Google Cloud SQL)

AWS RDS for PostgreSQL

Creating a Multi-AZ deployment (recommended for production):

aws rds create-db-instance \
  --db-instance-identifier myapp-postgres \
  --db-instance-class db.t4g.large \
  --engine postgres \
  --master-username postgres \
  --master-user-password 'SecurePassword123!' \
  --allocated-storage 100 \
  --storage-type gp3 \
  --multi-az \
  --publicly-accessible false \
  --vpc-security-group-ids sg-12345678 \
  --db-subnet-group-name default-vpc-group \
  --backup-retention-period 30 \
  --preferred-backup-window "03:00-04:00" \
  --preferred-maintenance-window "sun:04:00-sun:05:00" \
  --enable-cloudwatch-logs-exports '["postgresql"]' \
  --enable-iam-database-authentication

Key parameters:

--multi-az: Enables automatic failover to standby in another AZ
--backup-retention-period 30: Keep 30 days of automated backups
--enable-iam-database-authentication: Use IAM roles instead of passwords

Monitoring RDS Performance

# Check database metrics using AWS CLI
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=myapp-postgres \
  --start-time 2025-01-01T00:00:00Z \
  --end-time 2025-01-02T00:00:00Z \
  --period 300 \
  --statistics Average,Maximum

Connection String

postgresql://postgres:[email protected]:5432/myapp

Serverless PostgreSQL

AWS Aurora Serverless v2

Aurora Serverless scales automatically based on workload. No capacity planning needed.

aws rds create-db-cluster \
  --db-cluster-identifier myapp-aurora \
  --engine aurora-postgresql \
  --engine-version 15.2 \
  --database-name myapp \
  --master-username postgres \
  --master-user-password 'SecurePassword123!' \
  --serverlessv2-scaling-configuration MinCapacity=0.5,MaxCapacity=4 \
  --backup-retention-period 30 \
  --enable-http-endpoint

Scaling behavior:

Minimum capacity: 0.5 ACU (AWS Capacity Unit) = ~1GB memory
Maximum capacity: 4 ACU = ~8GB memory
Auto-scales in 1-second intervals based on CPU and connections

Pricing: Pay per ACU-second, typically $0.06/ACU/hour during peak.

Vercel Postgres

For Next.js applications, Vercel Postgres provides integrated serverless PostgreSQL:

// Using Vercel SDK
import { sql } from '@vercel/postgres';

export default async function handler(req, res) {
  try {
    const result = await sql`SELECT * FROM users LIMIT 10`;
    res.status(200).json(result.rows);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

MongoDB Hosting Strategies

Self-Hosted MongoDB

Replica Set Setup (High Availability)

A replica set is MongoDB’s native replication mechanism. Minimum 3 nodes recommended for production.

Docker Compose Setup

version: '3.8'
services:
  mongo1:
    image: mongo:7.0
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo1_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth
              --keyFile=/etc/mongo/keyfile

  mongo2:
    image: mongo:7.0
    ports:
      - "27018:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo2_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth

  mongo3:
    image: mongo:7.0
    ports:
      - "27019:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo3_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth

volumes:
  mongo1_data:
  mongo2_data:
  mongo3_data:

Initialize the Replica Set

# Connect to primary
mongosh mongodb://admin:password123@localhost:27017

# Initialize replica set
rs.initiate({
  _id: "myreplicaset",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 1 }
  ]
})

# Check replica set status
rs.status()

Connection String for Replica Set

mongodb://appuser:password@mongo1:27017,mongo2:27018,mongo3:27019/myapp?replicaSet=myreplicaset&authSource=admin

With automatic failover and read preference:

const client = new MongoClient(uri, {
  replicaSet: 'myreplicaset',
  retryWrites: true,
  readPreference: 'secondaryPreferred',  // Read from replicas if available
});

Managed MongoDB (MongoDB Atlas)

MongoDB Atlas is MongoDB’s official DBaaS, with global distribution and enterprise features built-in.

Creating a Cluster

# Using MongoDB Atlas CLI (install from https://www.mongodb.com/docs/mongocli/stable/)
mongocli atlas clusters create \
  --projectId 63f2e9c8d1234567890abcde \
  --name production-cluster \
  --instanceSize M10 \
  --diskSizeGB 100 \
  --backup \
  --mdbVersion 7.0 \
  --cloudProvider AWS \
  --region us-east-1 \
  --type REPLICASET \
  --numShards 1

Atlas Connection String

mongodb+srv://username:[email protected]/myapp?retryWrites=true&w=majority

Key Features

Automated backups: Hourly snapshots, 35-day retention
Multi-region replica sets: High availability across regions
Sharding: Automatic horizontal scaling
Monitoring: Real-time metrics, slow query logs, index recommendations
Backup & Restore: Point-in-time restore, AWS S3 snapshots

Cost Optimization

M0 (free): 512MB storage, shared cluster, for development
M2 / M5: Shared clusters, $9-57/month, for testing/low-traffic
M10+: Dedicated clusters, $57+/month, production-grade

MongoDB Sharding (Horizontal Scaling)

When a single replica set reaches capacity, shard the data across multiple replica sets.

Sharded Cluster Architecture

┌─────────────────────────────┐
│   Application (Mongosh)     │
└─────────────────────────────┘
              ↓
┌─────────────────────────────────────────┐
│         Mongos (Query Router)           │
│  Routes queries to appropriate shard    │
└─────────────────────────────────────────┘
      ↓        ↓        ↓
  ┌─────┐  ┌─────┐  ┌─────┐
  │Shard│  │Shard│  │Shard│
  │  1  │  │  2  │  │  3  │
  │ (RS)│  │ (RS)│  │ (RS)│
  └─────┘  └─────┘  └─────┘
      ↓        ↓        ↓
Config Servers (Store sharding metadata)

Enable Sharding on a Collection

// First, shard the database
db.adminCommand({ enableSharding: "myapp" })

// Create a shard key (important decision!)
db.collection("orders").createIndex({ user_id: 1 })

// Enable sharding on the collection
db.adminCommand({
  shardCollection: "myapp.orders",
  key: { user_id: 1 }
})

// Verify sharding
sh.status()

Shard key considerations:

Choose a field that distributes data evenly (avoid monotonically increasing fields like timestamps for primary shard key)
Use compound shard keys (e.g., { user_id: 1, created_at: -1 }) for better distribution
Cannot change shard key after sharding without manual intervention

Key Decision Factors

Use this decision matrix to choose the right hosting option for your use case:

Factor	Self-Hosted	Managed (RDS/Atlas)	Serverless
Operational Overhead	High (24/7 DBA)	Low (vendor handles)	Minimal
Cost at Scale	Low (after setup)	Medium-High	Variable (can spike)
Setup Time	Days-Weeks	Minutes-Hours	Minutes
Customization	Unlimited	Limited	Very Limited
Scalability	Manual provisioning	Click/API	Auto-scaling
Uptime SLA	Depends on you	99.95%+	99.95%+
Backup/Recovery	Your responsibility	Automated	Automated
Security	Your responsibility	Shared responsibility	Shared responsibility
Cost Predictability	Predictable	Predictable	Variable
Vendor Lock-In	None	High	High

Decision logic:

Startup / MVP: → Serverless or managed services (get to market fast)
SaaS / Production: → Managed services (reliability, compliance, monitoring)
Cost-Sensitive + Scale: → Self-hosted + DBA team
Regulated Industry: → Self-hosted or managed with compliance certifications
Global Users: → Managed with multi-region or serverless with CDN
Unpredictable Traffic: → Serverless (auto-scales) or managed with auto-scaling

Database Migration Strategies

Migrating a database is one of the riskiest operations you’ll perform. Plan carefully and test exhaustively.

Pre-Migration Checklist

Database audit (schema, indices, stored procedures, data volume)
Network connectivity between source and target (firewall rules, security groups)
Backup source database (point-in-time recovery safety net)
Estimate migration duration (test on copy of production data)
Plan communication (notify users, prepare rollback plan)
Test rollback procedure (actually perform it on staging)
Monitor disk space on both source and target
Review application code for hardcoded connection strings

Simple Lift-and-Shift (Downtime Required)

For small databases (<10GB) with acceptable downtime window (30 mins - 2 hours):

PostgreSQL Dump & Restore

# Step 1: Dump source database
pg_dump -h source.db.com -U postgres -d myapp --verbose \
  --format=directory --jobs=4 \
  -f /tmp/dump_myapp

# Step 2: Compress for transfer (optional)
tar -czf dump_myapp.tar.gz /tmp/dump_myapp

# Step 3: (At target) Stop application
# Update connection strings in app config
# Point to new database host

# Step 4: Restore to target
pg_restore -h target.db.com -U postgres -d myapp \
  --verbose --jobs=4 \
  /tmp/dump_myapp

# Step 5: Verify data
psql -h target.db.com -U postgres -d myapp -c "SELECT COUNT(*) FROM users;"

# Step 6: Restart application
systemctl restart myapp

MongoDB Dump & Restore

# Step 1: Dump source database
mongodump --uri="mongodb://admin:[email protected]/myapp" \
  --out=/tmp/mongo_dump --authenticationDatabase admin

# Step 2: Restore to target
mongorestore --uri="mongodb://admin:[email protected]" \
  --authenticationDatabase admin \
  /tmp/mongo_dump

# Step 3: Verify data
mongosh --uri="mongodb://admin:[email protected]/myapp" \
  --eval "db.collection('users').count()"

# Step 4: Update application connection strings and restart

Zero-Downtime Migrations

For large databases or critical systems, zero-downtime migration is essential.

PostgreSQL: Dual-Write & Parallel Running

Phase 1 (24h before cutover):
  ├─ Start replication from source → target
  ├─ Monitor replication lag (target catching up)
  └─ Verify data consistency

Phase 2 (1h before cutover):
  ├─ Enable dual-write mode in application
  │  (writes go to both source and target)
  ├─ Continue reads from source
  └─ Verify dual-write is working

Phase 3 (Cutover, 5-15 mins):
  ├─ Stop application briefly
  ├─ Final sync of replication
  ├─ Verify data consistency (source == target)
  ├─ Switch reads+writes to target
  ├─ Resume application
  └─ Monitor for issues (30 mins)

Phase 4 (If issues):
  ├─ Quickly switch back to source
  ├─ Diagnose issue
  ├─ Fix and retry phase 3

Implementation Pattern (Node.js + Prisma)

// Dual-write wrapper
class DualWriteDb {
  private sourceDb: PrismaClient;
  private targetDb: PrismaClient;
  private dualWriteEnabled = false;

  async create(table: string, data: any) {
    // Always write to source (current primary)
    const result = await this.sourceDb[table].create({ data });

    // If dual-write enabled, also write to target
    if (this.dualWriteEnabled) {
      try {
        await this.targetDb[table].create({ data });
      } catch (error) {
        // Log but don't fail the request
        console.error(`Dual-write to target failed: ${error.message}`);
        await notifyOpsTeam(`Dual-write failure: ${error.message}`);
      }
    }

    return result;
  }

  async read(table: string, where: any) {
    // Read from source during migration
    return this.sourceDb[table].findUnique({ where });
  }

  enableDualWrite() {
    this.dualWriteEnabled = true;
    console.log('Dual-write mode enabled');
  }

  disableDualWrite() {
    this.dualWriteEnabled = false;
    console.log('Dual-write mode disabled');
  }

  async switchToTarget() {
    // Verify consistency first
    const sourceCount = await this.sourceDb.users.count();
    const targetCount = await this.targetDb.users.count();

    if (sourceCount !== targetCount) {
      throw new Error('Data count mismatch! Aborting switch.');
    }

    // Switch primary database
    // (In practice, update environment variables and reconnect)
    console.log('Switched primary to target database');
  }
}

Using PostgreSQL Streaming Replication for Zero-Downtime

# On target server, set up as standby (parallel to live system)
sudo -u postgres pg_basebackup -h source.db.com \
  -D /var/lib/postgresql/15/main -U replication -v -P -W -R

# Start standby (begins streaming from source)
sudo systemctl start postgresql

# Monitor replication lag on target
watch -n 1 'sudo -u postgres psql -c "SELECT pg_last_xact_replay_timestamp();"'
# Should show current time (no lag) once caught up

# When ready for cutover:
# 1. Promote standby to primary
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/15/main

# 2. Update application connection string
# 3. Point primary back to old server as standby (if needed for rollback)

MongoDB: Replica Set Expansion + Resync

# Phase 1: Add new members to replica set (while old replica set runs)
rs.add({ host: "new-shard-1:27017" })
rs.add({ host: "new-shard-2:27017" })
rs.add({ host: "new-shard-3:27017" })

# Wait for replication to catch up
rs.status()  # Watch sync_lag

# Phase 2: Create mongos router pointing to new shards
mongos --configDB <new-config-servers> --bind_ip_all

# Phase 3: Switch application connection string to mongos

# Phase 4: Once stable, decommission old replica set
rs.remove("old-shard-1:27017")

Backup & Disaster Recovery

PostgreSQL Backup Strategy

Automated Backups (RDS / Managed Services)

RDS handles this automatically:

Continuous backups (30-day retention by default)
Automated daily snapshots
Point-in-time recovery (restore to any second in retention window)

Self-Hosted: WAL Archiving

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /backup/wal_archive/%f && cp %p /backup/wal_archive/%f'
archive_timeout = 300

With cloud storage:

# Install WAL-E (or WAL-G for better compression)
pip install wal-e

# Configure for S3
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export WALE_S3_PREFIX=s3://my-backups/postgres-wal

# Update archive_command
archive_command = 'wal-e wal-push %p'

Full Backups

# Weekly full backup using pg_basebackup
pg_basebackup -h localhost -U postgres -D /backup/pg_backup_$(date +%Y%m%d) \
  -v -Ft -X stream -z

# Backup to S3 with encryption
tar -czf - /backup/pg_backup_$(date +%Y%m%d) | \
  aws s3 cp - s3://my-backups/pg-full-$(date +%Y%m%d).tar.gz \
  --sse AES256

Test Restore Procedure (Critical!)

# Periodically, actually restore from backup to verify it works
# Never assume backups are good without testing

# Restore from WAL archive
wal-e backup-fetch /restore_dir LATEST
pg_ctl -D /restore_dir start

MongoDB Backup Strategy

Atlas Automatic Backups

MongoDB Atlas includes:

Continuous backups (hourly snapshots)
35-day retention window
Point-in-time restore (any second)
Download for off-site storage

Self-Hosted: mongodump + Object Storage

#!/bin/bash
# Daily backup script

BACKUP_DIR="/backups/mongo_$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

# Backup all databases
mongodump --uri="mongodb://admin:pass@localhost/admin" \
  --out=$BACKUP_DIR

# Compress
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR

# Upload to S3
aws s3 cp $BACKUP_DIR.tar.gz s3://my-backups/ \
  --sse AES256 --storage-class GLACIER_IR

# Keep local copy for 7 days
find /backups -name "mongo_*.tar.gz" -mtime +7 -delete

Test Restore

# Extract backup
tar -xzf mongo_20250101.tar.gz

# Restore to separate instance/database
mongorestore --uri="mongodb://admin:pass@localhost:27018" \
  --drop mongo_20250101

# Validate data
mongosh --uri="mongodb://admin:pass@localhost:27018" \
  --eval "db.collection('users').countDocuments()"

Monitoring & Performance Optimization

PostgreSQL Monitoring

Key Metrics to Track

-- Slow queries (queries > 1 second)
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Cache hit ratio (should be > 99%)
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit) as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_ratio
FROM pg_statio_user_tables;

-- Connection count
SELECT count(*) FROM pg_stat_activity;

-- Missing indices (high sequential scans)
SELECT schemaname, tablename, seq_scan, seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_scan DESC;

Automated Monitoring with Prometheus

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'postgres'
    static_configs:
      - targets: ['localhost:9187']  # postgres_exporter

# docker-compose.yml for postgres_exporter
services:
  postgres_exporter:
    image: prometheuscommunity/postgres-exporter
    environment:
      DATA_SOURCE_NAME: "postgresql://postgres:password@postgres:5432/myapp?sslmode=disable"
    ports:
      - "9187:9187"

MongoDB Monitoring

Key Metrics

// Connection pool status
db.adminCommand({ connPoolStats: 1 })

// Current operations (long-running queries)
db.currentOp({ millis: 5000 })

// Database stats (data size, index size)
db.stats()

// Replica set status
rs.status()

// Replication lag
db.adminCommand({ replSetGetStatus: 1 }).members
  .filter(m => m.name !== rs.status().members.find(x => x.state === 1).name)
  .map(m => ({ name: m.name, optime: m.optime }))

Atlas Monitoring Dashboard

MongoDB Atlas provides:

Real-time metrics (CPU, memory, network, IOPS)
Slow query logs (queries > 100ms by default)
Index recommendations
Storage metrics

Common Pitfalls & How to Avoid Them

Pitfall 1: Insufficient Backup Testing

Problem: You have backups, but when you need to restore, they’re corrupted or incomplete.

Prevention:

Monthly restore tests to a staging environment
Automated backup integrity checks
Multiple backup copies (local + cloud)
Document your RTO/RPO requirements

Pitfall 2: Connection Pool Exhaustion During Migration

Problem: Application connection pool fills up, new requests queue and timeout.

Prevention:

// Implement connection pooling with limits
const pool = new Pool({
  max: 20,                    // Max connections
  idleTimeoutMillis: 30000,   // Close idle after 30s
  connectionTimeoutMillis: 2000,
  queue_strategy: 'LIFO'      // Reuse recently-used connections
});

// Monitor pool status
setInterval(() => {
  console.log(`Pool: ${pool._activeCount} active, ${pool._waitingCount} waiting`);
}, 5000);

Pitfall 3: Shard Key Regret

Problem: You chose a bad MongoDB shard key and can’t change it without major work.

Prevention:

Analyze your query patterns before sharding
Avoid monotonically increasing values (timestamps as primary key)
Use compound keys: { tenant_id: 1, user_id: 1 }
Plan for future scaling when choosing initial shard key

Pitfall 4: Skipping Capacity Planning

Problem: Database runs out of disk space during peak traffic.

Prevention:

# Monitor disk usage weekly
df -h /data

# Set up alerts
# Alert if > 80% full
# Auto-scale storage if using cloud (RDS, Atlas)

# Calculate growth rate
# If growing 5GB/month, plan upgrade now instead of emergency later

Pitfall 5: Not Monitoring Replication Lag

Problem: Read replicas fall behind, serving stale data without anyone knowing.

Prevention:

-- PostgreSQL: Check replication lag frequently
SELECT client_addr, state, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;

-- If lag > 1 second, investigate network or heavy write load

// MongoDB: Check replication status
db.adminCommand({ replSetGetStatus: 1 }).members
  .filter(m => m.state !== 1)  // Not primary
  .forEach(m => {
    const lag = new Date(m.optime.ts) - new Date(rs.status().optime.ts);
    console.log(`${m.name}: ${lag}ms lag`);
  });

Pitfall 6: Hardcoded Database Credentials

Problem: Credentials in code, exposed in Git history or logs.

Prevention:

# Use environment variables
DATABASE_URL="postgresql://user:pass@host:5432/db"

# Or use secrets management
# AWS Secrets Manager, Azure Key Vault, HashiCorp Vault

# For RDS: Use IAM authentication (no password needed)
PGPASSWORD=$(aws rds-db auth-token --hostname mydb.123456789.us-east-1.rds.amazonaws.com --port 5432 --region us-east-1 --username postgres) \
psql -h mydb.123456789.us-east-1.rds.amazonaws.com -U postgres -d myapp

Pitfall 7: No Rollback Plan

Problem: Migration fails partway, but you can’t quickly rollback.

Prevention:

Always have a detailed, tested rollback procedure
Keep source database running for minimum 48 hours post-migration
DNS/load balancer switch (fast rollback) vs code deployment (slower)
For critical systems, maintain dual-run for 1-2 weeks

# Example: DNS-based failover for quick rollback
# In DNS provider:
# db.myapp.com → 10.0.1.10 (new)
# If issues: update to → 10.0.1.5 (old)  within 1 minute

Tools & Frameworks for Migrations

Database Migration Tools

Tool	Database	Language	Strengths
Flyway	Multi (Postgres, MySQL, etc)	Java-based, SQL scripts	Simple, version-based, team-friendly
Liquibase	Multi	XML/YAML/JSON	Complex changesets, rollback support
Prisma Migrate	Postgres, MySQL, SQLite	TypeScript	Auto-generated, app-aware migrations
Alembic	Any via SQLAlchemy	Python	Powerful, good for complex migrations
node-pg-migrate	PostgreSQL	Node.js/JavaScript	Lightweight, good for JS projects
Knex.js	Multi	JavaScript	Flexible, integrates with app code

PostgreSQL Migration Tools

pg_upgrade (for major version upgrades):

# Upgrade PostgreSQL 14 → 15 (in-place)
/usr/lib/postgresql/15/bin/pg_upgrade \
  -b /usr/lib/postgresql/14/bin \
  -B /usr/lib/postgresql/15/bin \
  -d /var/lib/postgresql/14/main \
  -D /var/lib/postgresql/15/main \
  -O postgres -o '-c max_connections=250'

Foreign Data Wrappers (FDW) (for cross-database replication):

-- Create foreign table pointing to source database
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER source_db
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '10.0.1.5', dbname 'myapp', port '5432');

CREATE FOREIGN TABLE source_users () 
  INHERITS (users)
  SERVER source_db
  OPTIONS (schema_name 'public', table_name 'users');

-- Sync data
INSERT INTO users SELECT * FROM source_users WHERE NOT EXISTS (...);

MongoDB Migration Tools

MongoDB Database Tools:

mongodump / mongorestore - Backup & restore
mongoimport / mongoexport - JSON/CSV import/export
Stitch Functions - Serverless functions for data transformation during migration

Atlas Migration Service:

# MongoDB Atlas Live Migration (zero-downtime)
# Handles replication, validation, and switchover automatically

atlas clusters migrate create \
  --projectId <project-id> \
  --sourceClusterConnectionString <source-connection-string> \
  --targetClusterName <target-cluster-name>

Conclusion

Choosing the right database hosting and executing migrations flawlessly are critical for maintaining system reliability. Here are the key takeaways:

Hosting Decision Summary

Startups / MVPs: Use managed services (Heroku Postgres, MongoDB Atlas) or serverless (Aurora Serverless, Vercel Postgres)
High-scale / cost-sensitive: Invest in self-hosted + DBA team (with proper tooling)
SaaS / critical systems: Managed services with 99.95%+ uptime SLAs
Global applications: Managed services with multi-region replication or serverless

Migration Best Practices

Always test migrations on production-scale data in a staging environment
Have a detailed, tested rollback plan before you start
Use dual-write patterns for large, high-traffic databases
Monitor replication lag constantly during cutover
Run automated backup and restore tests monthly
Implement connection pooling and monitoring before migration
Plan capacity in advance (don’t run out of disk space during migration)
Communicate clearly with all stakeholders (ops, product, support)

Tools & Automation

Use migration frameworks (Prisma, Flyway, Alembic) instead of ad-hoc SQL
Implement infrastructure-as-code (Terraform, CloudFormation) for reproducibility
Automate backup & recovery testing with scripts
Set up monitoring and alerting (Prometheus, DataDog, New Relic) before migration

Post-Migration

Monitor the new database for 48+ hours (performance, replication, errors)
Keep source database online for 48-72 hours for quick rollback if needed
Update documentation with new connection strings, credentials, topology
Verify that all application features work with new database
Collect lessons learned and update runbooks for next migration

With careful planning, the right tools, and a culture of testing, you can execute database migrations with confidence and minimal risk to production systems.