Introduction
Deploying software is risky. Every change has the potential to introduce bugs, cause downtime, or negatively impact user experience. Traditional “big bang” deployments where you replace 100% of traffic at once are increasingly unacceptable for modern applications that demand high availability.
This guide covers deployment strategies that minimize risk while enabling frequent releases. You’ll learn blue-green deployments for instant rollbacks, canary releases for gradual traffic shifting, and progressive delivery patterns that combine deployment strategies with feature flags and experimentation.
Understanding Deployment Strategies
The Risk Spectrum
| Strategy | Risk Level | Rollback Speed | Complexity |
|---|---|---|---|
| Manual All-at-Once | High | Slow | Low |
| Blue-Green | Low | Instant | Medium |
| Canary | Very Low | Fast | Medium |
| Progressive | Very Low | Instant | High |
| Ring-based | Low | Fast | Medium |
Blue-Green Deployments
Concept
Blue-green deployment maintains two identical production environments. At any time, only one serves live traffic. When deploying, you switch traffic from the active environment to the updated one:
Blue (Active) โ Traffic โ Users
Green (Standby) โ Deployment
After deployment:
Blue (Standby)
Green (Active) โ Traffic โ Users
Kubernetes Blue-Green with Ingress
# deployment-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:2.0.0
ports:
- containerPort: 8080
---
# service-green.yaml
apiVersion: v1
kind: Service
metadata:
name: app-green
spec:
selector:
app: myapp
version: green
ports:
- port: 80
targetPort: 8080
# ingress.yaml - Switch traffic by updating weight
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "100" # 100% to green
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-green
port:
number: 80
AWS Elastic Beanstalk Blue-Green
# Create new environment (green)
aws elasticbeanstalk create-environment \
--application-name myapp \
--environment-name green \
--solution-stack-name "64bit Amazon Linux 2 v3.6.0 running Node.js 18" \
--version-label v2.0.0
# Swap CNAME to switch traffic
aws elasticbeanstalk swap-environment-cnames \
--source-environment-name blue \
--destination-environment-name green
# If issues - swap back immediately
aws elasticbeanstalk swap-environment-cnames \
--source-environment-name green \
--destination-environment-name blue
Blue-Green with Database Changes
Database migrations require special handling:
// Zero-downtime migration strategy
async function deployWithMigration() {
const migration = new Migration();
// 1. Deploy new application version (reads from both old and new schemas)
await deploy('v2.0.0');
// 2. Run backward-compatible database migration
await migration.addColumnNullable('users', 'new_field');
// 3. Backfill data
await migration.backfill('new_field');
// 4. Make column non-nullable
await migration.makeColumnNotNullable('users', 'new_field');
// 5. Switch traffic
await switchTraffic('green');
// 6. Remove old code paths after confirmation
await deploy('v2.0.1'); // No longer reads old field
}
Canary Releases
Concept
Canary releases gradually shift a small percentage of traffic to the new version, monitoring for errors before increasing:
10% Traffic โ v2.0.0 (Canary)
90% Traffic โ v1.9.0 (Stable)
โ
Monitor Metrics
โ
50% Traffic โ v2.0.0
โ
100% Traffic โ v2.0.0
Kubernetes Canary with Service Mesh
# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
spec:
replicas: 1 # Smaller for canary
selector:
matchLabels:
app: myapp
version: canary
template:
metadata:
labels:
app: myapp
version: canary
spec:
containers:
- name: app
image: myapp:2.0.0-canary
---
# istio-virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.example.com
http:
- route:
- destination:
host: myapp
subset: stable
weight: 90
- destination:
host: myapp
subset: canary
weight: 10
Flagger: Automated Canary Deployments
# Canary CRD - Flagger automates the process
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
service:
port: 80
analysis:
interval: 1m
threshold: 10
maxWeight: 50
stepWeight: 10
# Metrics that determine success
metrics:
- name: request-success-rate
interval: 1m
target:
value: 99
- name: request-duration
interval: 1m
target:
histogram:
0.99: 500
Progressive Delivery in Practice
# Custom canary controller logic
class CanaryController:
def __init__(self):
self.traffic_manager = TrafficManager()
self.observability = ObservabilityClient()
self.error_threshold = 0.01 # 1% error rate
self.latency_threshold = 500 # 500ms p99
async def promote_canary(self, canary_version, stable_version):
weights = [1, 5, 10, 25, 50, 100]
for weight in weights:
# Update traffic weight
await self.traffic_manager.set_weight(canary_version, weight)
# Wait and analyze metrics
await asyncio.sleep(60) # 1 minute per stage
metrics = await self.observability.get_metrics(
canary_version,
window='5m'
)
# Check error rate
if metrics['error_rate'] > self.error_threshold:
await self.rollback(canary_version)
raise CanaryFailed(f"Error rate {metrics['error_rate']} exceeded threshold")
# Check latency
if metrics['p99_latency'] > self.latency_threshold:
await self.rollback(canary_version)
raise CanaryFailed(f"Latency {metrics['p99_latency']}ms exceeded threshold")
# All checks passed - complete promotion
await self.promote_to_stable(canary_version)
Feature Flags and Progressive Rollout
Feature Flag Implementation
// Feature flag service
class FeatureFlagService {
constructor(redis, userService) {
this.redis = redis;
this.userService = userService;
}
async isEnabled(flag, userId = null) {
const config = await this.redis.get(`feature:${flag}`);
if (!config) return false;
const { enabled, rollout } = JSON.parse(config);
if (!enabled) return false;
// No user-specific targeting
if (!rollout || rollout.type === 'all') return true;
// Percentage rollout
if (rollout.type === 'percentage') {
// Consistent hashing for same user
const hash = this.hash(`${flag}:${userId}`);
return hash < rollout.percentage;
}
// User targeting
if (rollout.type === 'users' && userId) {
return rollout.users.includes(userId);
}
return false;
}
hash(str) {
// Simple hash returning 0-100
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash |= 0;
}
return Math.abs(hash) % 100;
}
}
// Usage in code
const flags = new FeatureFlagService(redis, userService);
app.get('/api/new-feature', async (req, res) => {
const userId = req.user?.id;
if (await flags.isEnabled('new-checkout-flow', userId)) {
return res.json(await getNewCheckoutFlow(req));
}
return res.json(await getOldCheckoutFlow(req));
});
LaunchDarkly Integration
// server.js with LaunchDarkly
const LaunchDarkly = require('launchdarkly-node-server-sdk');
const ldClient = LaunchDarkly.init(process.env.LD_SDK_KEY);
await ldClient.waitForInitialization();
app.get('/api/dashboard', async (req, res) => {
const user = {
key: req.session?.userId,
email: req.user?.email,
custom: {
plan: req.user?.plan
}
};
const showNewDashboard = await ldClient.boolVariation(
'new-dashboard-design',
user,
false
);
const newAnalytics = await ldClient.numberVariation(
'analytics-percentile',
user,
99
);
if (showNewDashboard) {
return res.json(getNewDashboard(req.user, newAnalytics));
}
return res.json(getOldDashboard(req.user));
});
Feature Flag Best Practices
// Clean up old feature flags
async function cleanupOldFlags() {
const staleFlags = await ldClient.allFlagsState({
key: 'system'
});
// Remove flags that have been enabled 100% for 30+ days
for (const [flag, value] of Object.entries(staleFlags.allValues())) {
const metadata = await getFlagMetadata(flag);
if (value && metadata.enabled100PercentSince) {
const daysSince = (Date.now() - metadata.enabled100PercentSince) / (1000 * 60 * 60 * 24);
if (daysSince > 30) {
await ldClient.updateFlag(flag, { on: false });
console.log(`Retired flag: ${flag}`);
// Clean up code paths in next release
// add to cleanup PR
}
}
}
}
Database Changes in Continuous Deployment
Expand-Contract Pattern
-- Phase 1: Expand (add new column nullable)
ALTER TABLE users
ADD COLUMN new_email VARCHAR(255);
-- Phase 2: Migrate (backfill)
UPDATE users
SET new_email = email
WHERE new_email IS NULL;
-- Phase 3: Switch (update code to write new column)
-- Code change: write to both columns
-- Phase 4: Contract (remove old column after verification)
ALTER TABLE users
DROP COLUMN email;
Schema Migration Tools
// db-migrate configuration
// database.json
{
"dev": {
"driver": "pg",
"connectionString": "postgres://localhost/dev"
},
"prod": {
"driver": "pg",
"connectionString": "postgres://prod:5432/app"
}
}
// migrations/20240312000000-add-user-preferences.js
exports.up = function(db) {
return db.addColumn('users', 'preferences', {
type: 'jsonb',
defaultValue: '{}'
});
};
exports.down = function(db) {
return db.removeColumn('users', 'preferences');
};
# Run migrations
export DATABASE_URL=postgres://prod:5432/app
db-migrate up --verbose
Automated Rollback
Kubernetes Rollback
# Rollback to previous revision
kubectl rollout undo deployment/myapp
# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=3
# Watch rollback progress
kubectl rollout status deployment/myapp
ArgoCD Rollback
# ArgoCD rollback
argocd app rollback myapp 1
# View history
argocd app history myapp
Custom Rollback Controller
# rollback_controller.py
from kubernetes import client, config
from prometheus_client import Counter
ROLLBACK_COUNT = Counter('deployments_rollback_total', 'Total rollbacks')
class RollbackController:
def __init__(self):
config.load_incluster_config()
self.apps = client.AppsV1Api()
self.monitoring = MonitoringClient()
async def monitor_deployment(self, deployment_name, namespace='default'):
"""Monitor deployment and auto-rollback on failure."""
for attempt in range(60): # 5 minutes
deployment = self.apps.read_namespaced_deployment(
deployment_name, namespace
)
status = deployment.status
# Check if rollout is stuck
if status.updated_replicas != status.replicas:
await asyncio.sleep(5)
continue
# Check for errors
if status.collision_count is not None:
await self.rollback(deployment_name, namespace)
ROLLBACK_COUNT.inc()
raise DeploymentError("Collision detected - rolled back")
# Check metrics for errors
error_rate = await self.monitoring.get_error_rate(deployment_name)
if error_rate > 0.05: # 5% error threshold
await self.rollback(deployment_name, namespace)
ROLLBACK_COUNT.inc()
raise DeploymentError(f"High error rate: {error_rate} - rolled back")
await asyncio.sleep(5)
async def rollback(self, deployment_name, namespace):
self.apps.rollback_namespaced_deployment(
deployment_name,
namespace
)
Deployment Observability
Deployment Metrics
# Track deployment health
from prometheus_client import Counter, Histogram, Gauge
deployments_total = Counter(
'deployments_total',
'Total deployments',
['version', 'environment', 'status']
)
deployment_duration = Histogram(
'deployment_duration_seconds',
'Deployment duration',
['version', 'environment']
)
canary_traffic_weight = Gauge(
'canary_traffic_weight',
'Current canary traffic percentage',
['service']
)
def track_deployment_metrics(func):
@wraps(func)
async def wrapper(*args, **kwargs):
version = kwargs.get('version', 'unknown')
environment = kwargs.get('environment', 'unknown')
start = time.time()
try:
result = await func(*args, **kwargs)
deployments_total.labels(
version=version,
environment=environment,
status='success'
).inc()
return result
except Exception as e:
deployments_total.labels(
version=version,
environment=environment,
status='failure'
).inc()
raise
finally:
deployment_duration.labels(
version=version,
environment=environment
).observe(time.time() - start)
return wrapper
Deployment Events
# Send deployment notifications
import SlackWebhook from 'slack-webhook';
const slack = new SlackWebhook(process.env.SLACK_WEBHOOK_URL);
async function notifyDeployment(event) {
const color = event.status === 'success' ? 'good' : 'danger';
await slack.send({
attachments: [{
color,
fields: [
{ title: 'Service', value: event.service, short: true },
{ title: 'Version', value: event.version, short: true },
{ title: 'Environment', value: event.environment, short: true },
{ title: 'Status', value: event.status, short: true },
{ title: 'Duration', value: `${event.duration}s`, short: true },
{ title: 'Author', value: event.author, short: true }
],
footer: 'Deployment Pipeline',
ts: Date.now() / 1000
}]
});
}
GitOps Deployment
ArgoCD Application
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/myorg/manifests.git
targetRevision: HEAD
path: myapp/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
Flux Configuration
# flux/gotk-components.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: myapp
namespace: flux-system
spec:
interval: 1m
url: https://github.com/myorg/manifests
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 10m
sourceRef:
kind: GitRepository
name: myapp
path: ./myapp
prune: true
validation: client
Comparison and Decision Matrix
| Strategy | Best For | Complexity | Rollback Time | Traffic Risk |
|---|---|---|---|---|
| Blue-Green | Database changes, complete rewrites | Medium | Instant | All or nothing |
| Canary | New features, API changes | Medium | Fast | Gradual |
| Feature Flags | A/B testing, quick on/off | Low | Instant | Controlled |
| Progressive | Large scale, multi-region | High | Instant | Minimal |
Conclusion
Modern deployment strategies are essential for teams that want to ship frequently without sacrificing stability. Start with blue-green for simpler use cases and database changes, graduate to canary releases for riskier feature deployments, and combine with feature flags for maximum control.
The key principles remain constant: automate everything, measure everything, and always have a working rollback. With proper observability and automated rollbacks, you can deploy with confidence multiple times per day.
Comments