Skip to main content
โšก Calmops

Database Setup & Deployment: PostgreSQL & MongoDB Hosting Strategies & Zero-Downtime Migrations

Table of Contents

Database Setup & Deployment: PostgreSQL & MongoDB Hosting Strategies & Zero-Downtime Migrations

Choosing where and how to deploy your database is one of the most critical infrastructure decisions you’ll make. Whether you’re launching a new product or migrating from legacy systems, the wrong choice can result in downtime, data loss, or prohibitive costs. This comprehensive guide covers hosting options, deployment strategies, and zero-downtime migration techniques for both PostgreSQL and MongoDB.


Table of Contents

  1. Hosting Options Overview
  2. PostgreSQL Hosting Strategies
  3. MongoDB Hosting Strategies
  4. Key Decision Factors
  5. Database Migration Strategies
  6. Zero-Downtime Migrations
  7. Backup & Disaster Recovery
  8. Monitoring & Performance Optimization
  9. Common Pitfalls & How to Avoid Them
  10. Tools & Frameworks for Migrations
  11. Conclusion

Hosting Options Overview

When deploying a database, you have three primary models: self-hosted, managed services, and serverless/hybrid. Each model trades off control for operational burden.

Self-Hosted

What it is: You provision servers (VMs, bare metal) and manage the entire database stack (installation, patching, backups, replication).

Pros:

  • Maximum control over configuration and optimization
  • Potentially lower long-term costs at scale (after initial setup)
  • No vendor lock-in
  • Full visibility into system internals

Cons:

  • High operational overhead (24/7 monitoring, patching, failover management)
  • Initial expertise required (database administration, Linux, networking)
  • Scaling requires additional provisioning and configuration
  • Disaster recovery is your responsibility

Best for: Large organizations with dedicated DBA teams, cost-sensitive projects at scale, workloads with unusual requirements


Managed Services (DBaaS)

What it is: Cloud providers (AWS RDS, Azure Database, Google Cloud SQL, Heroku Postgres, etc.) handle infrastructure, patching, and backups. You manage application connection and configurations.

Pros:

  • Automated backups and point-in-time recovery
  • Automated patching and version upgrades
  • Built-in high availability (multi-AZ, failover)
  • Monitoring and alerting included
  • Scaling with a few clicks or API calls

Cons:

  • Higher per-instance cost than self-hosted
  • Limited customization (e.g., kernel parameters)
  • Potential vendor lock-in
  • Network latency if provisioned in different regions

Best for: SaaS companies, startups, teams without DBA expertise, applications needing uptime guarantees (99.9%+)


Serverless & Hybrid Models

What it is: Databases that scale automatically without provisioning. Examples: AWS Aurora Serverless, MongoDB Atlas, Supabase, PlanetScale (MySQL), Vercel Postgres.

Pros:

  • Auto-scaling based on demand (only pay for what you use)
  • Near-zero operational overhead
  • Global distribution options
  • Quick to get started (minutes vs hours)

Cons:

  • Cold start latency for idle instances
  • Unpredictable pricing during traffic spikes
  • Less control over instance behavior
  • Sometimes higher per-request costs for low-traffic apps

Best for: Prototype/MVP projects, bursty workloads, globally-distributed applications, startups optimizing for time-to-market


PostgreSQL Hosting Strategies

Self-Hosted PostgreSQL

For organizations choosing self-hosted PostgreSQL, here’s a production-ready setup:

Architecture: Primary + Standby Replicas

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        Application Layer (Load Balanced)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†“              โ†“              โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Connection Pooler (PgBouncer / pgcat)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†“ (writes)    โ†“ (reads)
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚ Primary (RW) โ”‚  โ”‚ Standby (RO) โ”‚
        โ”‚  PostgreSQL  โ”‚  โ”‚  PostgreSQL  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†“ (WAL shipping / replication)
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚ Standby #2   โ”‚
        โ”‚ (Async Rep)  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†“
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚ Backup Node  โ”‚
        โ”‚ (Scheduled)  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Basic Server Setup

# Install PostgreSQL on Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

# Start and enable PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Switch to postgres user and create database
sudo -u postgres psql
postgres=# CREATE DATABASE myapp;
postgres=# CREATE USER appuser WITH ENCRYPTED PASSWORD 'secure_password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE myapp TO appuser;
postgres=# \q

Enable Replication (Primary)

Edit /etc/postgresql/14/main/postgresql.conf:

# Replication settings
wal_level = replica                    # Enable WAL logging
max_wal_senders = 3                    # Max standby replicas
wal_keep_size = 2GB                    # Keep WAL segments
hot_standby = on                       # Allow queries on standby

Edit /etc/postgresql/14/main/pg_hba.conf to allow replication:

# Standby replication connection
host    replication     replication_user   10.0.1.10/32   md5
host    replication     replication_user   10.0.1.11/32   md5

Restart PostgreSQL:

sudo systemctl restart postgresql

Set Up Standby (Read Replica)

# On standby server, stop PostgreSQL if running
sudo systemctl stop postgresql

# Take base backup from primary
sudo -u postgres pg_basebackup -h 10.0.1.5 -D /var/lib/postgresql/14/main \
  -U replication_user -v -P -W -R

# Start standby in recovery mode
sudo systemctl start postgresql

Verify replication:

# On primary, check replication status
sudo -u postgres psql
postgres=# SELECT client_addr, state, sync_state FROM pg_stat_replication;
 client_addr  |   state   | sync_state
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 10.0.1.10    | streaming | async
 10.0.1.11    | streaming | async

Connection Pooling with PgBouncer

Install PgBouncer:

sudo apt-get install pgbouncer

Configure /etc/pgbouncer/pgbouncer.ini:

[databases]
myapp = host=10.0.1.5 port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction          # Connection pooling mode
max_client_conn = 1000           # Max client connections
default_pool_size = 25           # Connections per database
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3

; Logging
log_connections = 1
log_disconnections = 1
log_pooler_stats = 1

Start PgBouncer:

sudo systemctl start pgbouncer
sudo systemctl enable pgbouncer

Applications now connect to PgBouncer (port 6432) instead of PostgreSQL directly, gaining connection efficiency.


Managed PostgreSQL (AWS RDS / Azure Database / Google Cloud SQL)

AWS RDS for PostgreSQL

Creating a Multi-AZ deployment (recommended for production):

aws rds create-db-instance \
  --db-instance-identifier myapp-postgres \
  --db-instance-class db.t4g.large \
  --engine postgres \
  --master-username postgres \
  --master-user-password 'SecurePassword123!' \
  --allocated-storage 100 \
  --storage-type gp3 \
  --multi-az \
  --publicly-accessible false \
  --vpc-security-group-ids sg-12345678 \
  --db-subnet-group-name default-vpc-group \
  --backup-retention-period 30 \
  --preferred-backup-window "03:00-04:00" \
  --preferred-maintenance-window "sun:04:00-sun:05:00" \
  --enable-cloudwatch-logs-exports '["postgresql"]' \
  --enable-iam-database-authentication

Key parameters:

  • --multi-az: Enables automatic failover to standby in another AZ
  • --backup-retention-period 30: Keep 30 days of automated backups
  • --enable-iam-database-authentication: Use IAM roles instead of passwords

Monitoring RDS Performance

# Check database metrics using AWS CLI
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=myapp-postgres \
  --start-time 2025-01-01T00:00:00Z \
  --end-time 2025-01-02T00:00:00Z \
  --period 300 \
  --statistics Average,Maximum

Connection String

postgresql://postgres:[email protected]:5432/myapp

Serverless PostgreSQL

AWS Aurora Serverless v2

Aurora Serverless scales automatically based on workload. No capacity planning needed.

aws rds create-db-cluster \
  --db-cluster-identifier myapp-aurora \
  --engine aurora-postgresql \
  --engine-version 15.2 \
  --database-name myapp \
  --master-username postgres \
  --master-user-password 'SecurePassword123!' \
  --serverlessv2-scaling-configuration MinCapacity=0.5,MaxCapacity=4 \
  --backup-retention-period 30 \
  --enable-http-endpoint

Scaling behavior:

  • Minimum capacity: 0.5 ACU (AWS Capacity Unit) = ~1GB memory
  • Maximum capacity: 4 ACU = ~8GB memory
  • Auto-scales in 1-second intervals based on CPU and connections

Pricing: Pay per ACU-second, typically $0.06/ACU/hour during peak.

Vercel Postgres

For Next.js applications, Vercel Postgres provides integrated serverless PostgreSQL:

// Using Vercel SDK
import { sql } from '@vercel/postgres';

export default async function handler(req, res) {
  try {
    const result = await sql`SELECT * FROM users LIMIT 10`;
    res.status(200).json(result.rows);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

MongoDB Hosting Strategies

Self-Hosted MongoDB

Replica Set Setup (High Availability)

A replica set is MongoDB’s native replication mechanism. Minimum 3 nodes recommended for production.

Docker Compose Setup

version: '3.8'
services:
  mongo1:
    image: mongo:7.0
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo1_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth
              --keyFile=/etc/mongo/keyfile

  mongo2:
    image: mongo:7.0
    ports:
      - "27018:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo2_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth

  mongo3:
    image: mongo:7.0
    ports:
      - "27019:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: password123
    volumes:
      - mongo3_data:/data/db
    command: >
      mongod --replSet myreplicaset
              --bind_ip_all
              --auth

volumes:
  mongo1_data:
  mongo2_data:
  mongo3_data:

Initialize the Replica Set

# Connect to primary
mongosh mongodb://admin:password123@localhost:27017

# Initialize replica set
rs.initiate({
  _id: "myreplicaset",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 1 }
  ]
})

# Check replica set status
rs.status()

Connection String for Replica Set

mongodb://appuser:password@mongo1:27017,mongo2:27018,mongo3:27019/myapp?replicaSet=myreplicaset&authSource=admin

With automatic failover and read preference:

const client = new MongoClient(uri, {
  replicaSet: 'myreplicaset',
  retryWrites: true,
  readPreference: 'secondaryPreferred',  // Read from replicas if available
});

Managed MongoDB (MongoDB Atlas)

MongoDB Atlas is MongoDB’s official DBaaS, with global distribution and enterprise features built-in.

Creating a Cluster

# Using MongoDB Atlas CLI (install from https://www.mongodb.com/docs/mongocli/stable/)
mongocli atlas clusters create \
  --projectId 63f2e9c8d1234567890abcde \
  --name production-cluster \
  --instanceSize M10 \
  --diskSizeGB 100 \
  --backup \
  --mdbVersion 7.0 \
  --cloudProvider AWS \
  --region us-east-1 \
  --type REPLICASET \
  --numShards 1

Atlas Connection String

mongodb+srv://username:[email protected]/myapp?retryWrites=true&w=majority

Key Features

  • Automated backups: Hourly snapshots, 35-day retention
  • Multi-region replica sets: High availability across regions
  • Sharding: Automatic horizontal scaling
  • Monitoring: Real-time metrics, slow query logs, index recommendations
  • Backup & Restore: Point-in-time restore, AWS S3 snapshots

Cost Optimization

  • M0 (free): 512MB storage, shared cluster, for development
  • M2 / M5: Shared clusters, $9-57/month, for testing/low-traffic
  • M10+: Dedicated clusters, $57+/month, production-grade

MongoDB Sharding (Horizontal Scaling)

When a single replica set reaches capacity, shard the data across multiple replica sets.

Sharded Cluster Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Application (Mongosh)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Mongos (Query Router)           โ”‚
โ”‚  Routes queries to appropriate shard    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ†“        โ†“        โ†“
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”
  โ”‚Shardโ”‚  โ”‚Shardโ”‚  โ”‚Shardโ”‚
  โ”‚  1  โ”‚  โ”‚  2  โ”‚  โ”‚  3  โ”‚
  โ”‚ (RS)โ”‚  โ”‚ (RS)โ”‚  โ”‚ (RS)โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”˜
      โ†“        โ†“        โ†“
Config Servers (Store sharding metadata)

Enable Sharding on a Collection

// First, shard the database
db.adminCommand({ enableSharding: "myapp" })

// Create a shard key (important decision!)
db.collection("orders").createIndex({ user_id: 1 })

// Enable sharding on the collection
db.adminCommand({
  shardCollection: "myapp.orders",
  key: { user_id: 1 }
})

// Verify sharding
sh.status()

Shard key considerations:

  • Choose a field that distributes data evenly (avoid monotonically increasing fields like timestamps for primary shard key)
  • Use compound shard keys (e.g., { user_id: 1, created_at: -1 }) for better distribution
  • Cannot change shard key after sharding without manual intervention

Key Decision Factors

Use this decision matrix to choose the right hosting option for your use case:

Factor Self-Hosted Managed (RDS/Atlas) Serverless
Operational Overhead High (24/7 DBA) Low (vendor handles) Minimal
Cost at Scale Low (after setup) Medium-High Variable (can spike)
Setup Time Days-Weeks Minutes-Hours Minutes
Customization Unlimited Limited Very Limited
Scalability Manual provisioning Click/API Auto-scaling
Uptime SLA Depends on you 99.95%+ 99.95%+
Backup/Recovery Your responsibility Automated Automated
Security Your responsibility Shared responsibility Shared responsibility
Cost Predictability Predictable Predictable Variable
Vendor Lock-In None High High

Decision logic:

  1. Startup / MVP: โ†’ Serverless or managed services (get to market fast)
  2. SaaS / Production: โ†’ Managed services (reliability, compliance, monitoring)
  3. Cost-Sensitive + Scale: โ†’ Self-hosted + DBA team
  4. Regulated Industry: โ†’ Self-hosted or managed with compliance certifications
  5. Global Users: โ†’ Managed with multi-region or serverless with CDN
  6. Unpredictable Traffic: โ†’ Serverless (auto-scales) or managed with auto-scaling

Database Migration Strategies

Migrating a database is one of the riskiest operations you’ll perform. Plan carefully and test exhaustively.

Pre-Migration Checklist

  • Database audit (schema, indices, stored procedures, data volume)
  • Network connectivity between source and target (firewall rules, security groups)
  • Backup source database (point-in-time recovery safety net)
  • Estimate migration duration (test on copy of production data)
  • Plan communication (notify users, prepare rollback plan)
  • Test rollback procedure (actually perform it on staging)
  • Monitor disk space on both source and target
  • Review application code for hardcoded connection strings

Simple Lift-and-Shift (Downtime Required)

For small databases (<10GB) with acceptable downtime window (30 mins - 2 hours):

PostgreSQL Dump & Restore

# Step 1: Dump source database
pg_dump -h source.db.com -U postgres -d myapp --verbose \
  --format=directory --jobs=4 \
  -f /tmp/dump_myapp

# Step 2: Compress for transfer (optional)
tar -czf dump_myapp.tar.gz /tmp/dump_myapp

# Step 3: (At target) Stop application
# Update connection strings in app config
# Point to new database host

# Step 4: Restore to target
pg_restore -h target.db.com -U postgres -d myapp \
  --verbose --jobs=4 \
  /tmp/dump_myapp

# Step 5: Verify data
psql -h target.db.com -U postgres -d myapp -c "SELECT COUNT(*) FROM users;"

# Step 6: Restart application
systemctl restart myapp

MongoDB Dump & Restore

# Step 1: Dump source database
mongodump --uri="mongodb://admin:[email protected]/myapp" \
  --out=/tmp/mongo_dump --authenticationDatabase admin

# Step 2: Restore to target
mongorestore --uri="mongodb://admin:[email protected]" \
  --authenticationDatabase admin \
  /tmp/mongo_dump

# Step 3: Verify data
mongosh --uri="mongodb://admin:[email protected]/myapp" \
  --eval "db.collection('users').count()"

# Step 4: Update application connection strings and restart

Zero-Downtime Migrations

For large databases or critical systems, zero-downtime migration is essential.

PostgreSQL: Dual-Write & Parallel Running

Phase 1 (24h before cutover):
  โ”œโ”€ Start replication from source โ†’ target
  โ”œโ”€ Monitor replication lag (target catching up)
  โ””โ”€ Verify data consistency

Phase 2 (1h before cutover):
  โ”œโ”€ Enable dual-write mode in application
  โ”‚  (writes go to both source and target)
  โ”œโ”€ Continue reads from source
  โ””โ”€ Verify dual-write is working

Phase 3 (Cutover, 5-15 mins):
  โ”œโ”€ Stop application briefly
  โ”œโ”€ Final sync of replication
  โ”œโ”€ Verify data consistency (source == target)
  โ”œโ”€ Switch reads+writes to target
  โ”œโ”€ Resume application
  โ””โ”€ Monitor for issues (30 mins)

Phase 4 (If issues):
  โ”œโ”€ Quickly switch back to source
  โ”œโ”€ Diagnose issue
  โ”œโ”€ Fix and retry phase 3

Implementation Pattern (Node.js + Prisma)

// Dual-write wrapper
class DualWriteDb {
  private sourceDb: PrismaClient;
  private targetDb: PrismaClient;
  private dualWriteEnabled = false;

  async create(table: string, data: any) {
    // Always write to source (current primary)
    const result = await this.sourceDb[table].create({ data });

    // If dual-write enabled, also write to target
    if (this.dualWriteEnabled) {
      try {
        await this.targetDb[table].create({ data });
      } catch (error) {
        // Log but don't fail the request
        console.error(`Dual-write to target failed: ${error.message}`);
        await notifyOpsTeam(`Dual-write failure: ${error.message}`);
      }
    }

    return result;
  }

  async read(table: string, where: any) {
    // Read from source during migration
    return this.sourceDb[table].findUnique({ where });
  }

  enableDualWrite() {
    this.dualWriteEnabled = true;
    console.log('Dual-write mode enabled');
  }

  disableDualWrite() {
    this.dualWriteEnabled = false;
    console.log('Dual-write mode disabled');
  }

  async switchToTarget() {
    // Verify consistency first
    const sourceCount = await this.sourceDb.users.count();
    const targetCount = await this.targetDb.users.count();

    if (sourceCount !== targetCount) {
      throw new Error('Data count mismatch! Aborting switch.');
    }

    // Switch primary database
    // (In practice, update environment variables and reconnect)
    console.log('Switched primary to target database');
  }
}

Using PostgreSQL Streaming Replication for Zero-Downtime

# On target server, set up as standby (parallel to live system)
sudo -u postgres pg_basebackup -h source.db.com \
  -D /var/lib/postgresql/15/main -U replication -v -P -W -R

# Start standby (begins streaming from source)
sudo systemctl start postgresql

# Monitor replication lag on target
watch -n 1 'sudo -u postgres psql -c "SELECT pg_last_xact_replay_timestamp();"'
# Should show current time (no lag) once caught up

# When ready for cutover:
# 1. Promote standby to primary
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/15/main

# 2. Update application connection string
# 3. Point primary back to old server as standby (if needed for rollback)

MongoDB: Replica Set Expansion + Resync

# Phase 1: Add new members to replica set (while old replica set runs)
rs.add({ host: "new-shard-1:27017" })
rs.add({ host: "new-shard-2:27017" })
rs.add({ host: "new-shard-3:27017" })

# Wait for replication to catch up
rs.status()  # Watch sync_lag

# Phase 2: Create mongos router pointing to new shards
mongos --configDB <new-config-servers> --bind_ip_all

# Phase 3: Switch application connection string to mongos

# Phase 4: Once stable, decommission old replica set
rs.remove("old-shard-1:27017")

Backup & Disaster Recovery

PostgreSQL Backup Strategy

Automated Backups (RDS / Managed Services)

RDS handles this automatically:

  • Continuous backups (30-day retention by default)
  • Automated daily snapshots
  • Point-in-time recovery (restore to any second in retention window)

Self-Hosted: WAL Archiving

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /backup/wal_archive/%f && cp %p /backup/wal_archive/%f'
archive_timeout = 300

With cloud storage:

# Install WAL-E (or WAL-G for better compression)
pip install wal-e

# Configure for S3
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export WALE_S3_PREFIX=s3://my-backups/postgres-wal

# Update archive_command
archive_command = 'wal-e wal-push %p'

Full Backups

# Weekly full backup using pg_basebackup
pg_basebackup -h localhost -U postgres -D /backup/pg_backup_$(date +%Y%m%d) \
  -v -Ft -X stream -z

# Backup to S3 with encryption
tar -czf - /backup/pg_backup_$(date +%Y%m%d) | \
  aws s3 cp - s3://my-backups/pg-full-$(date +%Y%m%d).tar.gz \
  --sse AES256

Test Restore Procedure (Critical!)

# Periodically, actually restore from backup to verify it works
# Never assume backups are good without testing

# Restore from WAL archive
wal-e backup-fetch /restore_dir LATEST
pg_ctl -D /restore_dir start

MongoDB Backup Strategy

Atlas Automatic Backups

MongoDB Atlas includes:

  • Continuous backups (hourly snapshots)
  • 35-day retention window
  • Point-in-time restore (any second)
  • Download for off-site storage

Self-Hosted: mongodump + Object Storage

#!/bin/bash
# Daily backup script

BACKUP_DIR="/backups/mongo_$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

# Backup all databases
mongodump --uri="mongodb://admin:pass@localhost/admin" \
  --out=$BACKUP_DIR

# Compress
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR

# Upload to S3
aws s3 cp $BACKUP_DIR.tar.gz s3://my-backups/ \
  --sse AES256 --storage-class GLACIER_IR

# Keep local copy for 7 days
find /backups -name "mongo_*.tar.gz" -mtime +7 -delete

Test Restore

# Extract backup
tar -xzf mongo_20250101.tar.gz

# Restore to separate instance/database
mongorestore --uri="mongodb://admin:pass@localhost:27018" \
  --drop mongo_20250101

# Validate data
mongosh --uri="mongodb://admin:pass@localhost:27018" \
  --eval "db.collection('users').countDocuments()"

Monitoring & Performance Optimization

PostgreSQL Monitoring

Key Metrics to Track

-- Slow queries (queries > 1 second)
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Cache hit ratio (should be > 99%)
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit) as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_ratio
FROM pg_statio_user_tables;

-- Connection count
SELECT count(*) FROM pg_stat_activity;

-- Missing indices (high sequential scans)
SELECT schemaname, tablename, seq_scan, seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_scan DESC;

Automated Monitoring with Prometheus

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'postgres'
    static_configs:
      - targets: ['localhost:9187']  # postgres_exporter

# docker-compose.yml for postgres_exporter
services:
  postgres_exporter:
    image: prometheuscommunity/postgres-exporter
    environment:
      DATA_SOURCE_NAME: "postgresql://postgres:password@postgres:5432/myapp?sslmode=disable"
    ports:
      - "9187:9187"

MongoDB Monitoring

Key Metrics

// Connection pool status
db.adminCommand({ connPoolStats: 1 })

// Current operations (long-running queries)
db.currentOp({ millis: 5000 })

// Database stats (data size, index size)
db.stats()

// Replica set status
rs.status()

// Replication lag
db.adminCommand({ replSetGetStatus: 1 }).members
  .filter(m => m.name !== rs.status().members.find(x => x.state === 1).name)
  .map(m => ({ name: m.name, optime: m.optime }))

Atlas Monitoring Dashboard

MongoDB Atlas provides:

  • Real-time metrics (CPU, memory, network, IOPS)
  • Slow query logs (queries > 100ms by default)
  • Index recommendations
  • Storage metrics

Common Pitfalls & How to Avoid Them

Pitfall 1: Insufficient Backup Testing

Problem: You have backups, but when you need to restore, they’re corrupted or incomplete.

Prevention:

  • Monthly restore tests to a staging environment
  • Automated backup integrity checks
  • Multiple backup copies (local + cloud)
  • Document your RTO/RPO requirements

Pitfall 2: Connection Pool Exhaustion During Migration

Problem: Application connection pool fills up, new requests queue and timeout.

Prevention:

// Implement connection pooling with limits
const pool = new Pool({
  max: 20,                    // Max connections
  idleTimeoutMillis: 30000,   // Close idle after 30s
  connectionTimeoutMillis: 2000,
  queue_strategy: 'LIFO'      // Reuse recently-used connections
});

// Monitor pool status
setInterval(() => {
  console.log(`Pool: ${pool._activeCount} active, ${pool._waitingCount} waiting`);
}, 5000);

Pitfall 3: Shard Key Regret

Problem: You chose a bad MongoDB shard key and can’t change it without major work.

Prevention:

  • Analyze your query patterns before sharding
  • Avoid monotonically increasing values (timestamps as primary key)
  • Use compound keys: { tenant_id: 1, user_id: 1 }
  • Plan for future scaling when choosing initial shard key

Pitfall 4: Skipping Capacity Planning

Problem: Database runs out of disk space during peak traffic.

Prevention:

# Monitor disk usage weekly
df -h /data

# Set up alerts
# Alert if > 80% full
# Auto-scale storage if using cloud (RDS, Atlas)

# Calculate growth rate
# If growing 5GB/month, plan upgrade now instead of emergency later

Pitfall 5: Not Monitoring Replication Lag

Problem: Read replicas fall behind, serving stale data without anyone knowing.

Prevention:

-- PostgreSQL: Check replication lag frequently
SELECT client_addr, state, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;

-- If lag > 1 second, investigate network or heavy write load
// MongoDB: Check replication status
db.adminCommand({ replSetGetStatus: 1 }).members
  .filter(m => m.state !== 1)  // Not primary
  .forEach(m => {
    const lag = new Date(m.optime.ts) - new Date(rs.status().optime.ts);
    console.log(`${m.name}: ${lag}ms lag`);
  });

Pitfall 6: Hardcoded Database Credentials

Problem: Credentials in code, exposed in Git history or logs.

Prevention:

# Use environment variables
DATABASE_URL="postgresql://user:pass@host:5432/db"

# Or use secrets management
# AWS Secrets Manager, Azure Key Vault, HashiCorp Vault

# For RDS: Use IAM authentication (no password needed)
PGPASSWORD=$(aws rds-db auth-token --hostname mydb.123456789.us-east-1.rds.amazonaws.com --port 5432 --region us-east-1 --username postgres) \
psql -h mydb.123456789.us-east-1.rds.amazonaws.com -U postgres -d myapp

Pitfall 7: No Rollback Plan

Problem: Migration fails partway, but you can’t quickly rollback.

Prevention:

  • Always have a detailed, tested rollback procedure
  • Keep source database running for minimum 48 hours post-migration
  • DNS/load balancer switch (fast rollback) vs code deployment (slower)
  • For critical systems, maintain dual-run for 1-2 weeks
# Example: DNS-based failover for quick rollback
# In DNS provider:
# db.myapp.com โ†’ 10.0.1.10 (new)
# If issues: update to โ†’ 10.0.1.5 (old)  within 1 minute

Tools & Frameworks for Migrations

Database Migration Tools

Tool Database Language Strengths
Flyway Multi (Postgres, MySQL, etc) Java-based, SQL scripts Simple, version-based, team-friendly
Liquibase Multi XML/YAML/JSON Complex changesets, rollback support
Prisma Migrate Postgres, MySQL, SQLite TypeScript Auto-generated, app-aware migrations
Alembic Any via SQLAlchemy Python Powerful, good for complex migrations
node-pg-migrate PostgreSQL Node.js/JavaScript Lightweight, good for JS projects
Knex.js Multi JavaScript Flexible, integrates with app code

PostgreSQL Migration Tools

pg_upgrade (for major version upgrades):

# Upgrade PostgreSQL 14 โ†’ 15 (in-place)
/usr/lib/postgresql/15/bin/pg_upgrade \
  -b /usr/lib/postgresql/14/bin \
  -B /usr/lib/postgresql/15/bin \
  -d /var/lib/postgresql/14/main \
  -D /var/lib/postgresql/15/main \
  -O postgres -o '-c max_connections=250'

Foreign Data Wrappers (FDW) (for cross-database replication):

-- Create foreign table pointing to source database
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER source_db
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '10.0.1.5', dbname 'myapp', port '5432');

CREATE FOREIGN TABLE source_users () 
  INHERITS (users)
  SERVER source_db
  OPTIONS (schema_name 'public', table_name 'users');

-- Sync data
INSERT INTO users SELECT * FROM source_users WHERE NOT EXISTS (...);

MongoDB Migration Tools

MongoDB Database Tools:

  • mongodump / mongorestore - Backup & restore
  • mongoimport / mongoexport - JSON/CSV import/export
  • Stitch Functions - Serverless functions for data transformation during migration

Atlas Migration Service:

# MongoDB Atlas Live Migration (zero-downtime)
# Handles replication, validation, and switchover automatically

atlas clusters migrate create \
  --projectId <project-id> \
  --sourceClusterConnectionString <source-connection-string> \
  --targetClusterName <target-cluster-name>

Conclusion

Choosing the right database hosting and executing migrations flawlessly are critical for maintaining system reliability. Here are the key takeaways:

Hosting Decision Summary

  • Startups / MVPs: Use managed services (Heroku Postgres, MongoDB Atlas) or serverless (Aurora Serverless, Vercel Postgres)
  • High-scale / cost-sensitive: Invest in self-hosted + DBA team (with proper tooling)
  • SaaS / critical systems: Managed services with 99.95%+ uptime SLAs
  • Global applications: Managed services with multi-region replication or serverless

Migration Best Practices

  1. Always test migrations on production-scale data in a staging environment
  2. Have a detailed, tested rollback plan before you start
  3. Use dual-write patterns for large, high-traffic databases
  4. Monitor replication lag constantly during cutover
  5. Run automated backup and restore tests monthly
  6. Implement connection pooling and monitoring before migration
  7. Plan capacity in advance (don’t run out of disk space during migration)
  8. Communicate clearly with all stakeholders (ops, product, support)

Tools & Automation

  • Use migration frameworks (Prisma, Flyway, Alembic) instead of ad-hoc SQL
  • Implement infrastructure-as-code (Terraform, CloudFormation) for reproducibility
  • Automate backup & recovery testing with scripts
  • Set up monitoring and alerting (Prometheus, DataDog, New Relic) before migration

Post-Migration

  • Monitor the new database for 48+ hours (performance, replication, errors)
  • Keep source database online for 48-72 hours for quick rollback if needed
  • Update documentation with new connection strings, credentials, topology
  • Verify that all application features work with new database
  • Collect lessons learned and update runbooks for next migration

With careful planning, the right tools, and a culture of testing, you can execute database migrations with confidence and minimal risk to production systems.


Further Reading & Resources

Comments