Skip to main content
โšก Calmops

Progressive Delivery: Canary Deployments with Argo Rollouts and Flagger

Introduction

Traditional deployment strategies push changes to all users simultaneously, risking widespread outages if something goes wrong. Progressive delivery mitigates this risk by gradually rolling out changes, monitoring metrics, and automatically rolling back if issues are detected.

This article explores progressive delivery patterns, Argo Rollouts, Flagger implementation, and strategies for safe Kubernetes deployments.

Understanding Progressive Delivery

What is Progressive Delivery?

Progressive delivery extends continuous delivery with automated traffic shifting and analysis. Instead of binary success/failure, it enables:

  • Gradual rollout: Shift traffic incrementally
  • Real-time analysis: Monitor metrics during rollout
  • Automated rollback: Revert on detection of issues
  • Feature experiments: A/B testing with actual traffic

Deployment Strategies Comparison

Strategy Risk Complexity Rollback Time Use Case
Recreate High Low Seconds Development
Rolling Medium Low Minutes Standard updates
Blue-Green Low Medium Seconds Critical updates
Canary Very Low High Seconds Risky changes
Progressive Very Low High Instant Data-driven decisions

Argo Rollouts

Argo Rollouts is a Kubernetes controller that provides declarative rollout strategies with advanced traffic routing capabilities.

Installation

# Install Argo Rollouts controller
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/rollouts/releases/latest/download/install.yaml

# Install kubectl plugin
curl -LO https://github.com/argoproj/rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

Basic Rollout Configuration

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: {duration: 10m}
        - setWeight: 20
        - pause: {duration: 10m}
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 80
        - pause: {duration: 10m}
        - setWeight: 100
      canaryService: canary-service
      stableService: stable-service
      trafficRouting:
        nginx:
          stableIngress: app-ingress
          additionalIngressAnnotations:
            canary-by-header: X-Canary
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: myapp:v2.0.0
          ports:
            - containerPort: 8080

Analysis Template

Define automated validation:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.95
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m])) 
            / 
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
    - name: latency
      interval: 1m
      successCondition: result[0] <= 100
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            histogram_quantile(0.95, 
              sum(rate(http_request_duration_ms_bucket{service="{{args.service-name}}"}[5m])) 
              by (le)
            )

Connecting Analysis to Rollout

spec:
  strategy:
    canary:
      analysis:
        templates:
          - templateName: success-rate
        args:
          - name: service-name
            value: canary-service
      steps:
        - setWeight: 10
        - analysis:
            templates:
              - templateName: success-rate
        - pause: {duration: 5m}
        - setWeight: 30
        - analysis:
            templates:
              - templateName: success-rate
        - pause: {duration: 5m}
        - setWeight: 100

Flagger

Flagger is a progressive delivery operator that works with service meshes and ingress controllers to automate canary releases.

Installation

# Install Flagger with Istio
kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/deployment.yaml

# Install Flagger with Linkerd
# Or with Nginx Ingress
kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/contrib/nginx/deployment.yaml

Canary with Istio

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
    targetPort: 8080
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        interval: 1m
        # 95% success rate required
        successRate: 0.95
      - name: request-duration
        interval: 1m
        # 500ms max p99 latency
        duration: 500ms
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.default/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://my-app.default/"

Progressive Deployment with Analysis

spec:
  analysis:
    # Extended validation period
    interval: 2m
    iterations: 10
    threshold: 5
    
    # Custom metrics
    metrics:
      - name: error-rate
        interval: 2m
        successRate: 0.99
        query: |
          sum(rate(http_requests_total{
            namespace="default",
            job="my-app-canary",
            status=~"5.*"
          }[2m])) / sum(rate(http_requests_total{}[2m]))
      - name: business-metric
        interval: 2m
        # Custom metric from your application
        query: |
          sum(rate(orders_created{service="flagger"}[2m]))
    
    # Automated rollback
    autoFailSeconds: 300

A/B Testing Configuration

spec:
  analysis:
    # A/B testing with header-based routing
    match:
      - headers:
          x-canary:
            exact: "true"
    iterations: 10
    interval: 1m
    
  # Traffic weight based on header
  matcher:
    headers:
      x-user-id:
        type: regex
        regex: "^A.*"

Traffic Management

Istio Virtual Service Integration

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts:
    - my-app
  http:
    - route:
        - destination:
            host: my-app-stable
          weight: 90
        - destination:
            host: my-app-canary
          weight: 10

Nginx Ingress Canary

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app-canary
                port:
                  number: 80

Monitoring and Observability

Prometheus Metrics

# Flagger metrics
- name: canary_status
  help: Canary deployment status
  type: gauge
  metric: flagger_canary_status{namespace, name, type}

# Argo Rollouts metrics  
- name: rollout_phase
  help: Rollout current phase
  type: gauge
  metric: argo_rollouts_phase{namespace, name}

Grafana Dashboard

apiVersion: v1
kind: ConfigMap
metadata:
  name: progressive-delivery-dashboard
data:
  dashboard.json: |
    {
      "panels": [
        {
          "title": "Canary Traffic Weight",
          "type": "timeseries",
          "targets": [
            {
              "expr": "kube_deployment_spec_replicas{namespace=~\"$namespace\"}"
            }
          ]
        },
        {
          "title": "Error Rate by Version",
          "type": "timeseries",
          "targets": [
            {
              "expr": "rate(http_requests_total{service=~\".*-canary\"}[5m])"
            }
          ]
        },
        {
          "title": "Latency P99",
          "type": "timeseries",
          "targets": [
            {
              "expr": "histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))"
            }
          ]
        }
      ]
    }

Best Practices

1. Start Small

steps:
  - setWeight: 1      # Start with 1%
  - pause: {duration: 5m}
  - setWeight: 5       # Increase to 5%
  - pause: {duration: 10m}
  - setWeight: 20      # Then 20%

2. Define Clear Metrics

metrics:
  # Business metrics first
  - name: conversion-rate
    successRate: 0.98
  
  # Then reliability
  - name: error-rate
    successRate: 0.99
  
  # Then performance
  - name: latency-p99
    duration: 500ms

3. Implement Automated Rollback

spec:
  analysis:
    # Fail fast
    failureThreshold: 3
    interval: 1m
    
    # Notify but don't block
    webhooks:
      - name: slack-notify
        url: https://hooks.slack.com/services/xxx
        timeout: 30s

4. Use Feature Flags

Combine progressive delivery with feature flags:

// Gradual feature rollout
if (flags.isEnabled('new-checkout') && canaryVersion === 'v2') {
  return <NewCheckout />;
}
return <StandardCheckout />;

Patterns and Strategies

Blue-Green with Manual Approval

spec:
  strategy:
    blueGreen:
      autoPromotionEnabled: false  # Manual approval required
      previewService: preview

Multi-Step Canary

spec:
  strategy:
    canary:
      steps:
        # Initial smoke test
        - setWeight: 1
        - analysis:
            templates:
              - templateName: smoke-test
            args:
              - name: service-name
                value: {{template "name"}}-canary
        - pause: {duration: 2m}
        
        # Expand to small group
        - setWeight: 5
        - analysis:
            templates:
              - templateName: basic-metrics
        - pause: {duration: 5m}
        
        # Broader rollout
        - setWeight: 20
        - analysis:
            templates:
              - templateName: full-metrics
        - pause: {duration: 10m}
        
        # Full rollout
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100

Tools and Resources

Conclusion

Progressive delivery transforms deployments from risky events into controlled experiments. Argo Rollouts and Flagger provide powerful Kubernetes-native mechanisms for canary analysis, automated rollback, and traffic management. Start with simple canary configurations and evolve toward sophisticated multi-metric analysis as your deployment confidence grows.

Comments