Database Setup & Deployment: PostgreSQL & MongoDB Hosting Strategies & Zero-Downtime Migrations
Choosing where and how to deploy your database is one of the most critical infrastructure decisions you’ll make. Whether you’re launching a new product or migrating from legacy systems, the wrong choice can result in downtime, data loss, or prohibitive costs. This comprehensive guide covers hosting options, deployment strategies, and zero-downtime migration techniques for both PostgreSQL and MongoDB.
Table of Contents
- Hosting Options Overview
- PostgreSQL Hosting Strategies
- MongoDB Hosting Strategies
- Key Decision Factors
- Database Migration Strategies
- Zero-Downtime Migrations
- Backup & Disaster Recovery
- Monitoring & Performance Optimization
- Common Pitfalls & How to Avoid Them
- Tools & Frameworks for Migrations
- Conclusion
Hosting Options Overview
When deploying a database, you have three primary models: self-hosted, managed services, and serverless/hybrid. Each model trades off control for operational burden.
Self-Hosted
What it is: You provision servers (VMs, bare metal) and manage the entire database stack (installation, patching, backups, replication).
Pros:
- Maximum control over configuration and optimization
- Potentially lower long-term costs at scale (after initial setup)
- No vendor lock-in
- Full visibility into system internals
Cons:
- High operational overhead (24/7 monitoring, patching, failover management)
- Initial expertise required (database administration, Linux, networking)
- Scaling requires additional provisioning and configuration
- Disaster recovery is your responsibility
Best for: Large organizations with dedicated DBA teams, cost-sensitive projects at scale, workloads with unusual requirements
Managed Services (DBaaS)
What it is: Cloud providers (AWS RDS, Azure Database, Google Cloud SQL, Heroku Postgres, etc.) handle infrastructure, patching, and backups. You manage application connection and configurations.
Pros:
- Automated backups and point-in-time recovery
- Automated patching and version upgrades
- Built-in high availability (multi-AZ, failover)
- Monitoring and alerting included
- Scaling with a few clicks or API calls
Cons:
- Higher per-instance cost than self-hosted
- Limited customization (e.g., kernel parameters)
- Potential vendor lock-in
- Network latency if provisioned in different regions
Best for: SaaS companies, startups, teams without DBA expertise, applications needing uptime guarantees (99.9%+)
Serverless & Hybrid Models
What it is: Databases that scale automatically without provisioning. Examples: AWS Aurora Serverless, MongoDB Atlas, Supabase, PlanetScale (MySQL), Vercel Postgres.
Pros:
- Auto-scaling based on demand (only pay for what you use)
- Near-zero operational overhead
- Global distribution options
- Quick to get started (minutes vs hours)
Cons:
- Cold start latency for idle instances
- Unpredictable pricing during traffic spikes
- Less control over instance behavior
- Sometimes higher per-request costs for low-traffic apps
Best for: Prototype/MVP projects, bursty workloads, globally-distributed applications, startups optimizing for time-to-market
PostgreSQL Hosting Strategies
Self-Hosted PostgreSQL
For organizations choosing self-hosted PostgreSQL, here’s a production-ready setup:
Architecture: Primary + Standby Replicas
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer (Load Balanced) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Connection Pooler (PgBouncer / pgcat) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (writes) โ (reads)
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Primary (RW) โ โ Standby (RO) โ
โ PostgreSQL โ โ PostgreSQL โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ (WAL shipping / replication)
โโโโโโโโโโโโโโโโ
โ Standby #2 โ
โ (Async Rep) โ
โโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโ
โ Backup Node โ
โ (Scheduled) โ
โโโโโโโโโโโโโโโโ
Basic Server Setup
# Install PostgreSQL on Ubuntu/Debian
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib
# Start and enable PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Switch to postgres user and create database
sudo -u postgres psql
postgres=# CREATE DATABASE myapp;
postgres=# CREATE USER appuser WITH ENCRYPTED PASSWORD 'secure_password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE myapp TO appuser;
postgres=# \q
Enable Replication (Primary)
Edit /etc/postgresql/14/main/postgresql.conf:
# Replication settings
wal_level = replica # Enable WAL logging
max_wal_senders = 3 # Max standby replicas
wal_keep_size = 2GB # Keep WAL segments
hot_standby = on # Allow queries on standby
Edit /etc/postgresql/14/main/pg_hba.conf to allow replication:
# Standby replication connection
host replication replication_user 10.0.1.10/32 md5
host replication replication_user 10.0.1.11/32 md5
Restart PostgreSQL:
sudo systemctl restart postgresql
Set Up Standby (Read Replica)
# On standby server, stop PostgreSQL if running
sudo systemctl stop postgresql
# Take base backup from primary
sudo -u postgres pg_basebackup -h 10.0.1.5 -D /var/lib/postgresql/14/main \
-U replication_user -v -P -W -R
# Start standby in recovery mode
sudo systemctl start postgresql
Verify replication:
# On primary, check replication status
sudo -u postgres psql
postgres=# SELECT client_addr, state, sync_state FROM pg_stat_replication;
client_addr | state | sync_state
โโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโ
10.0.1.10 | streaming | async
10.0.1.11 | streaming | async
Connection Pooling with PgBouncer
Install PgBouncer:
sudo apt-get install pgbouncer
Configure /etc/pgbouncer/pgbouncer.ini:
[databases]
myapp = host=10.0.1.5 port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction # Connection pooling mode
max_client_conn = 1000 # Max client connections
default_pool_size = 25 # Connections per database
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
; Logging
log_connections = 1
log_disconnections = 1
log_pooler_stats = 1
Start PgBouncer:
sudo systemctl start pgbouncer
sudo systemctl enable pgbouncer
Applications now connect to PgBouncer (port 6432) instead of PostgreSQL directly, gaining connection efficiency.
Managed PostgreSQL (AWS RDS / Azure Database / Google Cloud SQL)
AWS RDS for PostgreSQL
Creating a Multi-AZ deployment (recommended for production):
aws rds create-db-instance \
--db-instance-identifier myapp-postgres \
--db-instance-class db.t4g.large \
--engine postgres \
--master-username postgres \
--master-user-password 'SecurePassword123!' \
--allocated-storage 100 \
--storage-type gp3 \
--multi-az \
--publicly-accessible false \
--vpc-security-group-ids sg-12345678 \
--db-subnet-group-name default-vpc-group \
--backup-retention-period 30 \
--preferred-backup-window "03:00-04:00" \
--preferred-maintenance-window "sun:04:00-sun:05:00" \
--enable-cloudwatch-logs-exports '["postgresql"]' \
--enable-iam-database-authentication
Key parameters:
--multi-az: Enables automatic failover to standby in another AZ--backup-retention-period 30: Keep 30 days of automated backups--enable-iam-database-authentication: Use IAM roles instead of passwords
Monitoring RDS Performance
# Check database metrics using AWS CLI
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=myapp-postgres \
--start-time 2025-01-01T00:00:00Z \
--end-time 2025-01-02T00:00:00Z \
--period 300 \
--statistics Average,Maximum
Connection String
postgresql://postgres:[email protected]:5432/myapp
Serverless PostgreSQL
AWS Aurora Serverless v2
Aurora Serverless scales automatically based on workload. No capacity planning needed.
aws rds create-db-cluster \
--db-cluster-identifier myapp-aurora \
--engine aurora-postgresql \
--engine-version 15.2 \
--database-name myapp \
--master-username postgres \
--master-user-password 'SecurePassword123!' \
--serverlessv2-scaling-configuration MinCapacity=0.5,MaxCapacity=4 \
--backup-retention-period 30 \
--enable-http-endpoint
Scaling behavior:
- Minimum capacity: 0.5 ACU (AWS Capacity Unit) = ~1GB memory
- Maximum capacity: 4 ACU = ~8GB memory
- Auto-scales in 1-second intervals based on CPU and connections
Pricing: Pay per ACU-second, typically $0.06/ACU/hour during peak.
Vercel Postgres
For Next.js applications, Vercel Postgres provides integrated serverless PostgreSQL:
// Using Vercel SDK
import { sql } from '@vercel/postgres';
export default async function handler(req, res) {
try {
const result = await sql`SELECT * FROM users LIMIT 10`;
res.status(200).json(result.rows);
} catch (error) {
res.status(500).json({ error: error.message });
}
}
MongoDB Hosting Strategies
Self-Hosted MongoDB
Replica Set Setup (High Availability)
A replica set is MongoDB’s native replication mechanism. Minimum 3 nodes recommended for production.
Docker Compose Setup
version: '3.8'
services:
mongo1:
image: mongo:7.0
ports:
- "27017:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: password123
volumes:
- mongo1_data:/data/db
command: >
mongod --replSet myreplicaset
--bind_ip_all
--auth
--keyFile=/etc/mongo/keyfile
mongo2:
image: mongo:7.0
ports:
- "27018:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: password123
volumes:
- mongo2_data:/data/db
command: >
mongod --replSet myreplicaset
--bind_ip_all
--auth
mongo3:
image: mongo:7.0
ports:
- "27019:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: password123
volumes:
- mongo3_data:/data/db
command: >
mongod --replSet myreplicaset
--bind_ip_all
--auth
volumes:
mongo1_data:
mongo2_data:
mongo3_data:
Initialize the Replica Set
# Connect to primary
mongosh mongodb://admin:password123@localhost:27017
# Initialize replica set
rs.initiate({
_id: "myreplicaset",
members: [
{ _id: 0, host: "mongo1:27017", priority: 2 },
{ _id: 1, host: "mongo2:27017", priority: 1 },
{ _id: 2, host: "mongo3:27017", priority: 1 }
]
})
# Check replica set status
rs.status()
Connection String for Replica Set
mongodb://appuser:password@mongo1:27017,mongo2:27018,mongo3:27019/myapp?replicaSet=myreplicaset&authSource=admin
With automatic failover and read preference:
const client = new MongoClient(uri, {
replicaSet: 'myreplicaset',
retryWrites: true,
readPreference: 'secondaryPreferred', // Read from replicas if available
});
Managed MongoDB (MongoDB Atlas)
MongoDB Atlas is MongoDB’s official DBaaS, with global distribution and enterprise features built-in.
Creating a Cluster
# Using MongoDB Atlas CLI (install from https://www.mongodb.com/docs/mongocli/stable/)
mongocli atlas clusters create \
--projectId 63f2e9c8d1234567890abcde \
--name production-cluster \
--instanceSize M10 \
--diskSizeGB 100 \
--backup \
--mdbVersion 7.0 \
--cloudProvider AWS \
--region us-east-1 \
--type REPLICASET \
--numShards 1
Atlas Connection String
mongodb+srv://username:[email protected]/myapp?retryWrites=true&w=majority
Key Features
- Automated backups: Hourly snapshots, 35-day retention
- Multi-region replica sets: High availability across regions
- Sharding: Automatic horizontal scaling
- Monitoring: Real-time metrics, slow query logs, index recommendations
- Backup & Restore: Point-in-time restore, AWS S3 snapshots
Cost Optimization
- M0 (free): 512MB storage, shared cluster, for development
- M2 / M5: Shared clusters, $9-57/month, for testing/low-traffic
- M10+: Dedicated clusters, $57+/month, production-grade
MongoDB Sharding (Horizontal Scaling)
When a single replica set reaches capacity, shard the data across multiple replica sets.
Sharded Cluster Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application (Mongosh) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Mongos (Query Router) โ
โ Routes queries to appropriate shard โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโ โโโโโโโ โโโโโโโ
โShardโ โShardโ โShardโ
โ 1 โ โ 2 โ โ 3 โ
โ (RS)โ โ (RS)โ โ (RS)โ
โโโโโโโ โโโโโโโ โโโโโโโ
โ โ โ
Config Servers (Store sharding metadata)
Enable Sharding on a Collection
// First, shard the database
db.adminCommand({ enableSharding: "myapp" })
// Create a shard key (important decision!)
db.collection("orders").createIndex({ user_id: 1 })
// Enable sharding on the collection
db.adminCommand({
shardCollection: "myapp.orders",
key: { user_id: 1 }
})
// Verify sharding
sh.status()
Shard key considerations:
- Choose a field that distributes data evenly (avoid monotonically increasing fields like timestamps for primary shard key)
- Use compound shard keys (e.g.,
{ user_id: 1, created_at: -1 }) for better distribution - Cannot change shard key after sharding without manual intervention
Key Decision Factors
Use this decision matrix to choose the right hosting option for your use case:
| Factor | Self-Hosted | Managed (RDS/Atlas) | Serverless |
|---|---|---|---|
| Operational Overhead | High (24/7 DBA) | Low (vendor handles) | Minimal |
| Cost at Scale | Low (after setup) | Medium-High | Variable (can spike) |
| Setup Time | Days-Weeks | Minutes-Hours | Minutes |
| Customization | Unlimited | Limited | Very Limited |
| Scalability | Manual provisioning | Click/API | Auto-scaling |
| Uptime SLA | Depends on you | 99.95%+ | 99.95%+ |
| Backup/Recovery | Your responsibility | Automated | Automated |
| Security | Your responsibility | Shared responsibility | Shared responsibility |
| Cost Predictability | Predictable | Predictable | Variable |
| Vendor Lock-In | None | High | High |
Decision logic:
- Startup / MVP: โ Serverless or managed services (get to market fast)
- SaaS / Production: โ Managed services (reliability, compliance, monitoring)
- Cost-Sensitive + Scale: โ Self-hosted + DBA team
- Regulated Industry: โ Self-hosted or managed with compliance certifications
- Global Users: โ Managed with multi-region or serverless with CDN
- Unpredictable Traffic: โ Serverless (auto-scales) or managed with auto-scaling
Database Migration Strategies
Migrating a database is one of the riskiest operations you’ll perform. Plan carefully and test exhaustively.
Pre-Migration Checklist
- Database audit (schema, indices, stored procedures, data volume)
- Network connectivity between source and target (firewall rules, security groups)
- Backup source database (point-in-time recovery safety net)
- Estimate migration duration (test on copy of production data)
- Plan communication (notify users, prepare rollback plan)
- Test rollback procedure (actually perform it on staging)
- Monitor disk space on both source and target
- Review application code for hardcoded connection strings
Simple Lift-and-Shift (Downtime Required)
For small databases (<10GB) with acceptable downtime window (30 mins - 2 hours):
PostgreSQL Dump & Restore
# Step 1: Dump source database
pg_dump -h source.db.com -U postgres -d myapp --verbose \
--format=directory --jobs=4 \
-f /tmp/dump_myapp
# Step 2: Compress for transfer (optional)
tar -czf dump_myapp.tar.gz /tmp/dump_myapp
# Step 3: (At target) Stop application
# Update connection strings in app config
# Point to new database host
# Step 4: Restore to target
pg_restore -h target.db.com -U postgres -d myapp \
--verbose --jobs=4 \
/tmp/dump_myapp
# Step 5: Verify data
psql -h target.db.com -U postgres -d myapp -c "SELECT COUNT(*) FROM users;"
# Step 6: Restart application
systemctl restart myapp
MongoDB Dump & Restore
# Step 1: Dump source database
mongodump --uri="mongodb://admin:[email protected]/myapp" \
--out=/tmp/mongo_dump --authenticationDatabase admin
# Step 2: Restore to target
mongorestore --uri="mongodb://admin:[email protected]" \
--authenticationDatabase admin \
/tmp/mongo_dump
# Step 3: Verify data
mongosh --uri="mongodb://admin:[email protected]/myapp" \
--eval "db.collection('users').count()"
# Step 4: Update application connection strings and restart
Zero-Downtime Migrations
For large databases or critical systems, zero-downtime migration is essential.
PostgreSQL: Dual-Write & Parallel Running
Phase 1 (24h before cutover):
โโ Start replication from source โ target
โโ Monitor replication lag (target catching up)
โโ Verify data consistency
Phase 2 (1h before cutover):
โโ Enable dual-write mode in application
โ (writes go to both source and target)
โโ Continue reads from source
โโ Verify dual-write is working
Phase 3 (Cutover, 5-15 mins):
โโ Stop application briefly
โโ Final sync of replication
โโ Verify data consistency (source == target)
โโ Switch reads+writes to target
โโ Resume application
โโ Monitor for issues (30 mins)
Phase 4 (If issues):
โโ Quickly switch back to source
โโ Diagnose issue
โโ Fix and retry phase 3
Implementation Pattern (Node.js + Prisma)
// Dual-write wrapper
class DualWriteDb {
private sourceDb: PrismaClient;
private targetDb: PrismaClient;
private dualWriteEnabled = false;
async create(table: string, data: any) {
// Always write to source (current primary)
const result = await this.sourceDb[table].create({ data });
// If dual-write enabled, also write to target
if (this.dualWriteEnabled) {
try {
await this.targetDb[table].create({ data });
} catch (error) {
// Log but don't fail the request
console.error(`Dual-write to target failed: ${error.message}`);
await notifyOpsTeam(`Dual-write failure: ${error.message}`);
}
}
return result;
}
async read(table: string, where: any) {
// Read from source during migration
return this.sourceDb[table].findUnique({ where });
}
enableDualWrite() {
this.dualWriteEnabled = true;
console.log('Dual-write mode enabled');
}
disableDualWrite() {
this.dualWriteEnabled = false;
console.log('Dual-write mode disabled');
}
async switchToTarget() {
// Verify consistency first
const sourceCount = await this.sourceDb.users.count();
const targetCount = await this.targetDb.users.count();
if (sourceCount !== targetCount) {
throw new Error('Data count mismatch! Aborting switch.');
}
// Switch primary database
// (In practice, update environment variables and reconnect)
console.log('Switched primary to target database');
}
}
Using PostgreSQL Streaming Replication for Zero-Downtime
# On target server, set up as standby (parallel to live system)
sudo -u postgres pg_basebackup -h source.db.com \
-D /var/lib/postgresql/15/main -U replication -v -P -W -R
# Start standby (begins streaming from source)
sudo systemctl start postgresql
# Monitor replication lag on target
watch -n 1 'sudo -u postgres psql -c "SELECT pg_last_xact_replay_timestamp();"'
# Should show current time (no lag) once caught up
# When ready for cutover:
# 1. Promote standby to primary
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/15/main
# 2. Update application connection string
# 3. Point primary back to old server as standby (if needed for rollback)
MongoDB: Replica Set Expansion + Resync
# Phase 1: Add new members to replica set (while old replica set runs)
rs.add({ host: "new-shard-1:27017" })
rs.add({ host: "new-shard-2:27017" })
rs.add({ host: "new-shard-3:27017" })
# Wait for replication to catch up
rs.status() # Watch sync_lag
# Phase 2: Create mongos router pointing to new shards
mongos --configDB <new-config-servers> --bind_ip_all
# Phase 3: Switch application connection string to mongos
# Phase 4: Once stable, decommission old replica set
rs.remove("old-shard-1:27017")
Backup & Disaster Recovery
PostgreSQL Backup Strategy
Automated Backups (RDS / Managed Services)
RDS handles this automatically:
- Continuous backups (30-day retention by default)
- Automated daily snapshots
- Point-in-time recovery (restore to any second in retention window)
Self-Hosted: WAL Archiving
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /backup/wal_archive/%f && cp %p /backup/wal_archive/%f'
archive_timeout = 300
With cloud storage:
# Install WAL-E (or WAL-G for better compression)
pip install wal-e
# Configure for S3
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export WALE_S3_PREFIX=s3://my-backups/postgres-wal
# Update archive_command
archive_command = 'wal-e wal-push %p'
Full Backups
# Weekly full backup using pg_basebackup
pg_basebackup -h localhost -U postgres -D /backup/pg_backup_$(date +%Y%m%d) \
-v -Ft -X stream -z
# Backup to S3 with encryption
tar -czf - /backup/pg_backup_$(date +%Y%m%d) | \
aws s3 cp - s3://my-backups/pg-full-$(date +%Y%m%d).tar.gz \
--sse AES256
Test Restore Procedure (Critical!)
# Periodically, actually restore from backup to verify it works
# Never assume backups are good without testing
# Restore from WAL archive
wal-e backup-fetch /restore_dir LATEST
pg_ctl -D /restore_dir start
MongoDB Backup Strategy
Atlas Automatic Backups
MongoDB Atlas includes:
- Continuous backups (hourly snapshots)
- 35-day retention window
- Point-in-time restore (any second)
- Download for off-site storage
Self-Hosted: mongodump + Object Storage
#!/bin/bash
# Daily backup script
BACKUP_DIR="/backups/mongo_$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR
# Backup all databases
mongodump --uri="mongodb://admin:pass@localhost/admin" \
--out=$BACKUP_DIR
# Compress
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR
# Upload to S3
aws s3 cp $BACKUP_DIR.tar.gz s3://my-backups/ \
--sse AES256 --storage-class GLACIER_IR
# Keep local copy for 7 days
find /backups -name "mongo_*.tar.gz" -mtime +7 -delete
Test Restore
# Extract backup
tar -xzf mongo_20250101.tar.gz
# Restore to separate instance/database
mongorestore --uri="mongodb://admin:pass@localhost:27018" \
--drop mongo_20250101
# Validate data
mongosh --uri="mongodb://admin:pass@localhost:27018" \
--eval "db.collection('users').countDocuments()"
Monitoring & Performance Optimization
PostgreSQL Monitoring
Key Metrics to Track
-- Slow queries (queries > 1 second)
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Cache hit ratio (should be > 99%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_ratio
FROM pg_statio_user_tables;
-- Connection count
SELECT count(*) FROM pg_stat_activity;
-- Missing indices (high sequential scans)
SELECT schemaname, tablename, seq_scan, seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_scan DESC;
Automated Monitoring with Prometheus
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'postgres'
static_configs:
- targets: ['localhost:9187'] # postgres_exporter
# docker-compose.yml for postgres_exporter
services:
postgres_exporter:
image: prometheuscommunity/postgres-exporter
environment:
DATA_SOURCE_NAME: "postgresql://postgres:password@postgres:5432/myapp?sslmode=disable"
ports:
- "9187:9187"
MongoDB Monitoring
Key Metrics
// Connection pool status
db.adminCommand({ connPoolStats: 1 })
// Current operations (long-running queries)
db.currentOp({ millis: 5000 })
// Database stats (data size, index size)
db.stats()
// Replica set status
rs.status()
// Replication lag
db.adminCommand({ replSetGetStatus: 1 }).members
.filter(m => m.name !== rs.status().members.find(x => x.state === 1).name)
.map(m => ({ name: m.name, optime: m.optime }))
Atlas Monitoring Dashboard
MongoDB Atlas provides:
- Real-time metrics (CPU, memory, network, IOPS)
- Slow query logs (queries > 100ms by default)
- Index recommendations
- Storage metrics
Common Pitfalls & How to Avoid Them
Pitfall 1: Insufficient Backup Testing
Problem: You have backups, but when you need to restore, they’re corrupted or incomplete.
Prevention:
- Monthly restore tests to a staging environment
- Automated backup integrity checks
- Multiple backup copies (local + cloud)
- Document your RTO/RPO requirements
Pitfall 2: Connection Pool Exhaustion During Migration
Problem: Application connection pool fills up, new requests queue and timeout.
Prevention:
// Implement connection pooling with limits
const pool = new Pool({
max: 20, // Max connections
idleTimeoutMillis: 30000, // Close idle after 30s
connectionTimeoutMillis: 2000,
queue_strategy: 'LIFO' // Reuse recently-used connections
});
// Monitor pool status
setInterval(() => {
console.log(`Pool: ${pool._activeCount} active, ${pool._waitingCount} waiting`);
}, 5000);
Pitfall 3: Shard Key Regret
Problem: You chose a bad MongoDB shard key and can’t change it without major work.
Prevention:
- Analyze your query patterns before sharding
- Avoid monotonically increasing values (timestamps as primary key)
- Use compound keys:
{ tenant_id: 1, user_id: 1 } - Plan for future scaling when choosing initial shard key
Pitfall 4: Skipping Capacity Planning
Problem: Database runs out of disk space during peak traffic.
Prevention:
# Monitor disk usage weekly
df -h /data
# Set up alerts
# Alert if > 80% full
# Auto-scale storage if using cloud (RDS, Atlas)
# Calculate growth rate
# If growing 5GB/month, plan upgrade now instead of emergency later
Pitfall 5: Not Monitoring Replication Lag
Problem: Read replicas fall behind, serving stale data without anyone knowing.
Prevention:
-- PostgreSQL: Check replication lag frequently
SELECT client_addr, state, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
-- If lag > 1 second, investigate network or heavy write load
// MongoDB: Check replication status
db.adminCommand({ replSetGetStatus: 1 }).members
.filter(m => m.state !== 1) // Not primary
.forEach(m => {
const lag = new Date(m.optime.ts) - new Date(rs.status().optime.ts);
console.log(`${m.name}: ${lag}ms lag`);
});
Pitfall 6: Hardcoded Database Credentials
Problem: Credentials in code, exposed in Git history or logs.
Prevention:
# Use environment variables
DATABASE_URL="postgresql://user:pass@host:5432/db"
# Or use secrets management
# AWS Secrets Manager, Azure Key Vault, HashiCorp Vault
# For RDS: Use IAM authentication (no password needed)
PGPASSWORD=$(aws rds-db auth-token --hostname mydb.123456789.us-east-1.rds.amazonaws.com --port 5432 --region us-east-1 --username postgres) \
psql -h mydb.123456789.us-east-1.rds.amazonaws.com -U postgres -d myapp
Pitfall 7: No Rollback Plan
Problem: Migration fails partway, but you can’t quickly rollback.
Prevention:
- Always have a detailed, tested rollback procedure
- Keep source database running for minimum 48 hours post-migration
- DNS/load balancer switch (fast rollback) vs code deployment (slower)
- For critical systems, maintain dual-run for 1-2 weeks
# Example: DNS-based failover for quick rollback
# In DNS provider:
# db.myapp.com โ 10.0.1.10 (new)
# If issues: update to โ 10.0.1.5 (old) within 1 minute
Tools & Frameworks for Migrations
Database Migration Tools
| Tool | Database | Language | Strengths |
|---|---|---|---|
| Flyway | Multi (Postgres, MySQL, etc) | Java-based, SQL scripts | Simple, version-based, team-friendly |
| Liquibase | Multi | XML/YAML/JSON | Complex changesets, rollback support |
| Prisma Migrate | Postgres, MySQL, SQLite | TypeScript | Auto-generated, app-aware migrations |
| Alembic | Any via SQLAlchemy | Python | Powerful, good for complex migrations |
| node-pg-migrate | PostgreSQL | Node.js/JavaScript | Lightweight, good for JS projects |
| Knex.js | Multi | JavaScript | Flexible, integrates with app code |
PostgreSQL Migration Tools
pg_upgrade (for major version upgrades):
# Upgrade PostgreSQL 14 โ 15 (in-place)
/usr/lib/postgresql/15/bin/pg_upgrade \
-b /usr/lib/postgresql/14/bin \
-B /usr/lib/postgresql/15/bin \
-d /var/lib/postgresql/14/main \
-D /var/lib/postgresql/15/main \
-O postgres -o '-c max_connections=250'
Foreign Data Wrappers (FDW) (for cross-database replication):
-- Create foreign table pointing to source database
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER source_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '10.0.1.5', dbname 'myapp', port '5432');
CREATE FOREIGN TABLE source_users ()
INHERITS (users)
SERVER source_db
OPTIONS (schema_name 'public', table_name 'users');
-- Sync data
INSERT INTO users SELECT * FROM source_users WHERE NOT EXISTS (...);
MongoDB Migration Tools
MongoDB Database Tools:
mongodump/mongorestore- Backup & restoremongoimport/mongoexport- JSON/CSV import/export- Stitch Functions - Serverless functions for data transformation during migration
Atlas Migration Service:
# MongoDB Atlas Live Migration (zero-downtime)
# Handles replication, validation, and switchover automatically
atlas clusters migrate create \
--projectId <project-id> \
--sourceClusterConnectionString <source-connection-string> \
--targetClusterName <target-cluster-name>
Conclusion
Choosing the right database hosting and executing migrations flawlessly are critical for maintaining system reliability. Here are the key takeaways:
Hosting Decision Summary
- Startups / MVPs: Use managed services (Heroku Postgres, MongoDB Atlas) or serverless (Aurora Serverless, Vercel Postgres)
- High-scale / cost-sensitive: Invest in self-hosted + DBA team (with proper tooling)
- SaaS / critical systems: Managed services with 99.95%+ uptime SLAs
- Global applications: Managed services with multi-region replication or serverless
Migration Best Practices
- Always test migrations on production-scale data in a staging environment
- Have a detailed, tested rollback plan before you start
- Use dual-write patterns for large, high-traffic databases
- Monitor replication lag constantly during cutover
- Run automated backup and restore tests monthly
- Implement connection pooling and monitoring before migration
- Plan capacity in advance (don’t run out of disk space during migration)
- Communicate clearly with all stakeholders (ops, product, support)
Tools & Automation
- Use migration frameworks (Prisma, Flyway, Alembic) instead of ad-hoc SQL
- Implement infrastructure-as-code (Terraform, CloudFormation) for reproducibility
- Automate backup & recovery testing with scripts
- Set up monitoring and alerting (Prometheus, DataDog, New Relic) before migration
Post-Migration
- Monitor the new database for 48+ hours (performance, replication, errors)
- Keep source database online for 48-72 hours for quick rollback if needed
- Update documentation with new connection strings, credentials, topology
- Verify that all application features work with new database
- Collect lessons learned and update runbooks for next migration
With careful planning, the right tools, and a culture of testing, you can execute database migrations with confidence and minimal risk to production systems.
Further Reading & Resources
- PostgreSQL Replication & Failover
- AWS RDS Best Practices
- MongoDB Replica Sets
- MongoDB Atlas Backup & Restore
- Flyway Migration Tool
- Prisma Migrate Docs
- High-Performance PostgreSQL
- MongoDB Performance Best Practices
- Disaster Recovery Planning Guide
Comments