Skip to main content
โšก Calmops

Continuous Deployment Strategies: Blue-Green, Canary, and Progressive Delivery

Introduction

Deploying software is risky. Every change has the potential to introduce bugs, cause downtime, or negatively impact user experience. Traditional “big bang” deployments where you replace 100% of traffic at once are increasingly unacceptable for modern applications that demand high availability.

This guide covers deployment strategies that minimize risk while enabling frequent releases. You’ll learn blue-green deployments for instant rollbacks, canary releases for gradual traffic shifting, and progressive delivery patterns that combine deployment strategies with feature flags and experimentation.

Understanding Deployment Strategies

The Risk Spectrum

Strategy Risk Level Rollback Speed Complexity
Manual All-at-Once High Slow Low
Blue-Green Low Instant Medium
Canary Very Low Fast Medium
Progressive Very Low Instant High
Ring-based Low Fast Medium

Blue-Green Deployments

Concept

Blue-green deployment maintains two identical production environments. At any time, only one serves live traffic. When deploying, you switch traffic from the active environment to the updated one:

Blue (Active) โ† Traffic โ† Users
Green (Standby) โ† Deployment

After deployment:
Blue (Standby)
Green (Active) โ† Traffic โ† Users

Kubernetes Blue-Green with Ingress

# deployment-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:2.0.0
        ports:
        - containerPort: 8080
---
# service-green.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-green
spec:
  selector:
    app: myapp
    version: green
  ports:
  - port: 80
    targetPort: 8080
# ingress.yaml - Switch traffic by updating weight
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "100"  # 100% to green
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-green
            port:
              number: 80

AWS Elastic Beanstalk Blue-Green

# Create new environment (green)
aws elasticbeanstalk create-environment \
  --application-name myapp \
  --environment-name green \
  --solution-stack-name "64bit Amazon Linux 2 v3.6.0 running Node.js 18" \
  --version-label v2.0.0

# Swap CNAME to switch traffic
aws elasticbeanstalk swap-environment-cnames \
  --source-environment-name blue \
  --destination-environment-name green

# If issues - swap back immediately
aws elasticbeanstalk swap-environment-cnames \
  --source-environment-name green \
  --destination-environment-name blue

Blue-Green with Database Changes

Database migrations require special handling:

// Zero-downtime migration strategy
async function deployWithMigration() {
    const migration = new Migration();
    
    // 1. Deploy new application version (reads from both old and new schemas)
    await deploy('v2.0.0');
    
    // 2. Run backward-compatible database migration
    await migration.addColumnNullable('users', 'new_field');
    
    // 3. Backfill data
    await migration.backfill('new_field');
    
    // 4. Make column non-nullable
    await migration.makeColumnNotNullable('users', 'new_field');
    
    // 5. Switch traffic
    await switchTraffic('green');
    
    // 6. Remove old code paths after confirmation
    await deploy('v2.0.1');  // No longer reads old field
}

Canary Releases

Concept

Canary releases gradually shift a small percentage of traffic to the new version, monitoring for errors before increasing:

10% Traffic โ†’ v2.0.0 (Canary)
90% Traffic โ†’ v1.9.0 (Stable)
         โ†“
   Monitor Metrics
         โ†“
50% Traffic โ†’ v2.0.0
         โ†“
100% Traffic โ†’ v2.0.0

Kubernetes Canary with Service Mesh

# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 1  # Smaller for canary
  selector:
    matchLabels:
      app: myapp
      version: canary
  template:
    metadata:
      labels:
        app: myapp
        version: canary
    spec:
      containers:
      - name: app
        image: myapp:2.0.0-canary
---
# istio-virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp.example.com
  http:
  - route:
    - destination:
        host: myapp
        subset: stable
      weight: 90
    - destination:
        host: myapp
        subset: canary
      weight: 10

Flagger: Automated Canary Deployments

# Canary CRD - Flagger automates the process
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 10
    
    # Metrics that determine success
    metrics:
    - name: request-success-rate
      interval: 1m
      target:
        value: 99
    - name: request-duration
      interval: 1m
      target:
        histogram:
          0.99: 500

Progressive Delivery in Practice

# Custom canary controller logic
class CanaryController:
    def __init__(self):
        self.traffic_manager = TrafficManager()
        self.observability = ObservabilityClient()
        self.error_threshold = 0.01  # 1% error rate
        self.latency_threshold = 500  # 500ms p99
    
    async def promote_canary(self, canary_version, stable_version):
        weights = [1, 5, 10, 25, 50, 100]
        
        for weight in weights:
            # Update traffic weight
            await self.traffic_manager.set_weight(canary_version, weight)
            
            # Wait and analyze metrics
            await asyncio.sleep(60)  # 1 minute per stage
            
            metrics = await self.observability.get_metrics(
                canary_version,
                window='5m'
            )
            
            # Check error rate
            if metrics['error_rate'] > self.error_threshold:
                await self.rollback(canary_version)
                raise CanaryFailed(f"Error rate {metrics['error_rate']} exceeded threshold")
            
            # Check latency
            if metrics['p99_latency'] > self.latency_threshold:
                await self.rollback(canary_version)
                raise CanaryFailed(f"Latency {metrics['p99_latency']}ms exceeded threshold")
        
        # All checks passed - complete promotion
        await self.promote_to_stable(canary_version)

Feature Flags and Progressive Rollout

Feature Flag Implementation

// Feature flag service
class FeatureFlagService {
    constructor(redis, userService) {
        this.redis = redis;
        this.userService = userService;
    }
    
    async isEnabled(flag, userId = null) {
        const config = await this.redis.get(`feature:${flag}`);
        if (!config) return false;
        
        const { enabled, rollout } = JSON.parse(config);
        if (!enabled) return false;
        
        // No user-specific targeting
        if (!rollout || rollout.type === 'all') return true;
        
        // Percentage rollout
        if (rollout.type === 'percentage') {
            // Consistent hashing for same user
            const hash = this.hash(`${flag}:${userId}`);
            return hash < rollout.percentage;
        }
        
        // User targeting
        if (rollout.type === 'users' && userId) {
            return rollout.users.includes(userId);
        }
        
        return false;
    }
    
    hash(str) {
        // Simple hash returning 0-100
        let hash = 0;
        for (let i = 0; i < str.length; i++) {
            hash = ((hash << 5) - hash) + str.charCodeAt(i);
            hash |= 0;
        }
        return Math.abs(hash) % 100;
    }
}

// Usage in code
const flags = new FeatureFlagService(redis, userService);

app.get('/api/new-feature', async (req, res) => {
    const userId = req.user?.id;
    
    if (await flags.isEnabled('new-checkout-flow', userId)) {
        return res.json(await getNewCheckoutFlow(req));
    }
    
    return res.json(await getOldCheckoutFlow(req));
});

LaunchDarkly Integration

// server.js with LaunchDarkly
const LaunchDarkly = require('launchdarkly-node-server-sdk');
const ldClient = LaunchDarkly.init(process.env.LD_SDK_KEY);

await ldClient.waitForInitialization();

app.get('/api/dashboard', async (req, res) => {
    const user = {
        key: req.session?.userId,
        email: req.user?.email,
        custom: {
            plan: req.user?.plan
        }
    };
    
    const showNewDashboard = await ldClient.boolVariation(
        'new-dashboard-design', 
        user, 
        false
    );
    
    const newAnalytics = await ldClient.numberVariation(
        'analytics-percentile',
        user,
        99
    );
    
    if (showNewDashboard) {
        return res.json(getNewDashboard(req.user, newAnalytics));
    }
    
    return res.json(getOldDashboard(req.user));
});

Feature Flag Best Practices

// Clean up old feature flags
async function cleanupOldFlags() {
    const staleFlags = await ldClient.allFlagsState({
        key: 'system'
    });
    
    // Remove flags that have been enabled 100% for 30+ days
    for (const [flag, value] of Object.entries(staleFlags.allValues())) {
        const metadata = await getFlagMetadata(flag);
        
        if (value && metadata.enabled100PercentSince) {
            const daysSince = (Date.now() - metadata.enabled100PercentSince) / (1000 * 60 * 60 * 24);
            
            if (daysSince > 30) {
                await ldClient.updateFlag(flag, { on: false });
                console.log(`Retired flag: ${flag}`);
                
                // Clean up code paths in next release
                // add to cleanup PR
            }
        }
    }
}

Database Changes in Continuous Deployment

Expand-Contract Pattern

-- Phase 1: Expand (add new column nullable)
ALTER TABLE users 
ADD COLUMN new_email VARCHAR(255);

-- Phase 2: Migrate (backfill)
UPDATE users 
SET new_email = email 
WHERE new_email IS NULL;

-- Phase 3: Switch (update code to write new column)
-- Code change: write to both columns

-- Phase 4: Contract (remove old column after verification)
ALTER TABLE users 
DROP COLUMN email;

Schema Migration Tools

// db-migrate configuration
// database.json
{
  "dev": {
    "driver": "pg",
    "connectionString": "postgres://localhost/dev"
  },
  "prod": {
    "driver": "pg",
    "connectionString": "postgres://prod:5432/app"
  }
}
// migrations/20240312000000-add-user-preferences.js
exports.up = function(db) {
    return db.addColumn('users', 'preferences', {
        type: 'jsonb',
        defaultValue: '{}'
    });
};

exports.down = function(db) {
    return db.removeColumn('users', 'preferences');
};
# Run migrations
export DATABASE_URL=postgres://prod:5432/app
db-migrate up --verbose

Automated Rollback

Kubernetes Rollback

# Rollback to previous revision
kubectl rollout undo deployment/myapp

# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=3

# Watch rollback progress
kubectl rollout status deployment/myapp

ArgoCD Rollback

# ArgoCD rollback
argocd app rollback myapp 1

# View history
argocd app history myapp

Custom Rollback Controller

# rollback_controller.py
from kubernetes import client, config
from prometheus_client import Counter

ROLLBACK_COUNT = Counter('deployments_rollback_total', 'Total rollbacks')

class RollbackController:
    def __init__(self):
        config.load_incluster_config()
        self.apps = client.AppsV1Api()
        self.monitoring = MonitoringClient()
    
    async def monitor_deployment(self, deployment_name, namespace='default'):
        """Monitor deployment and auto-rollback on failure."""
        
        for attempt in range(60):  # 5 minutes
            deployment = self.apps.read_namespaced_deployment(
                deployment_name, namespace
            )
            
            status = deployment.status
            
            # Check if rollout is stuck
            if status.updated_replicas != status.replicas:
                await asyncio.sleep(5)
                continue
            
            # Check for errors
            if status.collision_count is not None:
                await self.rollback(deployment_name, namespace)
                ROLLBACK_COUNT.inc()
                raise DeploymentError("Collision detected - rolled back")
            
            # Check metrics for errors
            error_rate = await self.monitoring.get_error_rate(deployment_name)
            if error_rate > 0.05:  # 5% error threshold
                await self.rollback(deployment_name, namespace)
                ROLLBACK_COUNT.inc()
                raise DeploymentError(f"High error rate: {error_rate} - rolled back")
            
            await asyncio.sleep(5)
    
    async def rollback(self, deployment_name, namespace):
        self.apps.rollback_namespaced_deployment(
            deployment_name, 
            namespace
        )

Deployment Observability

Deployment Metrics

# Track deployment health
from prometheus_client import Counter, Histogram, Gauge

deployments_total = Counter(
    'deployments_total',
    'Total deployments',
    ['version', 'environment', 'status']
)

deployment_duration = Histogram(
    'deployment_duration_seconds',
    'Deployment duration',
    ['version', 'environment']
)

canary_traffic_weight = Gauge(
    'canary_traffic_weight',
    'Current canary traffic percentage',
    ['service']
)

def track_deployment_metrics(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        version = kwargs.get('version', 'unknown')
        environment = kwargs.get('environment', 'unknown')
        
        start = time.time()
        try:
            result = await func(*args, **kwargs)
            deployments_total.labels(
                version=version,
                environment=environment,
                status='success'
            ).inc()
            return result
        except Exception as e:
            deployments_total.labels(
                version=version,
                environment=environment,
                status='failure'
            ).inc()
            raise
        finally:
            deployment_duration.labels(
                version=version,
                environment=environment
            ).observe(time.time() - start)
    
    return wrapper

Deployment Events

# Send deployment notifications
import SlackWebhook from 'slack-webhook';

const slack = new SlackWebhook(process.env.SLACK_WEBHOOK_URL);

async function notifyDeployment(event) {
    const color = event.status === 'success' ? 'good' : 'danger';
    
    await slack.send({
        attachments: [{
            color,
            fields: [
                { title: 'Service', value: event.service, short: true },
                { title: 'Version', value: event.version, short: true },
                { title: 'Environment', value: event.environment, short: true },
                { title: 'Status', value: event.status, short: true },
                { title: 'Duration', value: `${event.duration}s`, short: true },
                { title: 'Author', value: event.author, short: true }
            ],
            footer: 'Deployment Pipeline',
            ts: Date.now() / 1000
        }]
    });
}

GitOps Deployment

ArgoCD Application

# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/myorg/manifests.git
    targetRevision: HEAD
    path: myapp/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Flux Configuration

# flux/gotk-components.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/myorg/manifests
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: myapp
  path: ./myapp
  prune: true
  validation: client

Comparison and Decision Matrix

Strategy Best For Complexity Rollback Time Traffic Risk
Blue-Green Database changes, complete rewrites Medium Instant All or nothing
Canary New features, API changes Medium Fast Gradual
Feature Flags A/B testing, quick on/off Low Instant Controlled
Progressive Large scale, multi-region High Instant Minimal

Conclusion

Modern deployment strategies are essential for teams that want to ship frequently without sacrificing stability. Start with blue-green for simpler use cases and database changes, graduate to canary releases for riskier feature deployments, and combine with feature flags for maximum control.

The key principles remain constant: automate everything, measure everything, and always have a working rollback. With proper observability and automated rollbacks, you can deploy with confidence multiple times per day.

Comments