Skip to main content

Testing in Production: Feature Flags and Canary Releases

Published: February 23, 2026 Updated: May 24, 2026 Larry Qu 10 min read

Introduction

Testing in production sounds risky, but it’s often the only way to find real-world bugs. Modern practices like feature flags and canary releases let you test safely in production with minimal risk.


Why Test in Production?

┌─────────────────────────────────────────────────────────────┐
│            Testing in Production Benefits                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ✓ Real users, real data, real conditions                  │
│  ✓ Find issues CI can't catch                              │
│  ✓ Faster feedback loops                                   │
│  ✓ A/B test new features                                   │
│  ✓ Instant rollback if issues occur                         │
│                                                             │
│  Risks:                                                     │
│  ✗ Users affected by bugs                                   │
│  ✗ Potential service disruptions                           │
│                                                             │
│  Solution: FEATURE FLAGS + CANARY                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Feature Flags

Basic Implementation

// Feature flag service
class FeatureFlags {
  private flags = new Map<string, boolean>();
  
  enable(feature: string) {
    this.flags.set(feature, true);
  }
  
  disable(feature: string) {
    this.flags.set(feature, false);
  }
  
  isEnabled(feature: string): boolean {
    return this.flags.get(feature) ?? false;
  }
}

const flags = new FeatureFlags();

// Usage in code
if (flags.isEnabled('new-dashboard')) {
  return <NewDashboard />;
} else {
  return <LegacyDashboard />;
}

With Providers

// Use LaunchDarkly, Split, or Statsig
import { LaunchDarkly } from 'launchdarkly-node-server-sdk';

const client = LaunchDarkly.init(process.env.LD_KEY!);

// Check feature flag
async function checkFlag(userId: string, flag: string) {
  const value = await client.variation(flag, {
    key: userId
  }, false);
  return value;
}

// In Express route
app.get('/dashboard', async (req, res) => {
  const userId = req.user.id;
  const useNewDashboard = await checkFlag(userId, 'new-dashboard');
  
  if (useNewDashboard) {
    return res.render('dashboard-new');
  }
  return res.render('dashboard-legacy');
});

Gradual Rollout

// Percentage rollout
async function isInRollout(userId: string, percentage: number): Promise<boolean> {
  // Simple hash-based deterministic selection
  const hash = hashCode(userId);
  const bucket = Math.abs(hash) % 100;
  return bucket < percentage;
}

// Usage
const rollout = await isInRollout(userId, 10); // 10% rollout

if (rollout) {
  enableFeature('new-checkout');
}

Canary Deployments

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   Canary Deployment                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    Load Balancer                              │
│                         │                                    │
│           ┌─────────────┼─────────────┐                     │
│           │             │             │                     │
│      ┌────▼────┐   ┌────▼────┐   ┌────▼────┐              │
│      │ Canary  │   │ Canary  │   │  Main   │              │
│      │   v2    │   │   v2    │   │   v1    │              │
│      │  10%    │   │  10%    │   │  80%    │              │
│      └─────────┘   └─────────┘   └─────────┘              │
│           │             │             │                     │
│           └─────────────┼─────────────┘                     │
│                         ▼                                    │
│              Monitoring & Metrics                             │
│                         │                                    │
│              Promote or Rollback                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Kubernetes Canary

# kubernetes/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      version: canary
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v2
          ports:
            - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
  ports:
    - port: 80
      targetPort: 8080

Argo Rollouts

# argo-rollouts.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: {duration: 10m}
        - setWeight: 30
        - pause: {duration: 10m}
        - setWeight: 100
      canaryMetadata:
        labels:
          role: canary
      stableMetadata:
        labels:
          role: stable

Monitoring for Issues

Key Metrics

# Metrics to monitor
metrics:
  - "Error rate (should stay low)"
  - "Latency (p50, p95, p99)"
  - "HTTP status codes (4xx, 5xx)"
  - "Business metrics (conversions, signups)"
  - "User feedback"

Automated Rollback

// Automated canary analysis
async function analyzeCanary() {
  const metrics = await getMetrics('canary');
  
  const errorRate = metrics.errors / metrics.requests;
  const latencyP99 = metrics.latency.p99;
  
  // Rollback if error rate > 1%
  if (errorRate > 0.01) {
    await rollbackCanary();
    await alert('Canary rolled back - error rate exceeded 1%');
    return;
  }
  
  // Rollback if latency increased > 50%
  if (latencyP99 > baselineLatency * 1.5) {
    await rollbackCanary();
    await alert('Canary rolled back - latency degradation');
    return;
  }
  
  // Promote if metrics look good
  await promoteCanary();
}

Advanced A/B Testing and Statistical Analysis

Experiment Design

Proper A/B testing requires understanding statistical fundamentals:

interface ExperimentConfig {
  name: string;
  variants: Variant[];
  minimumDetectableEffect: number;  // e.g., 0.05 for 5% improvement
  significanceLevel: number;        // e.g., 0.05 for 95% confidence
  statisticalPower: number;         // e.g., 0.80 for 80% power
}

function calculateSampleSize(config: ExperimentConfig): number {
  const { minimumDetectableEffect, significanceLevel, statisticalPower } = config;
  const zAlpha = 1.96;  // z-score for 95% confidence
  const zBeta = 0.84;   // z-score for 80% power

  // Assume baseline conversion rate of 5%
  const baselineRate = 0.05;
  const variantRate = baselineRate * (1 + minimumDetectableEffect);
  const pooledRate = (baselineRate + variantRate) / 2;

  const sampleSize =
    Math.pow(zAlpha + zBeta, 2) *
    (baselineRate * (1 - baselineRate) + variantRate * (1 - variantRate)) /
    Math.pow(variantRate - baselineRate, 2);

  return Math.ceil(sampleSize);
}

// Expected vs actual results determine experiment duration
const requiredSample = calculateSampleSize({
  name: "checkout-redesign",
  variants: [{ name: "control" }, { name: "variant" }],
  minimumDetectableEffect: 0.1,  // Detect 10% relative change
  significanceLevel: 0.05,
  statisticalPower: 0.80,
});
// With 10,000 daily visitors, need ~7 days per variant

Bayesian Analysis for Faster Results

Bayesian methods provide more intuitive results than frequentist p-values:

class BayesianABTest {
  // Beta-Binomial model for conversion rates
  evaluate(control: { conversions: number; visitors: number },
           variant: { conversions: number; visitors: number }) {
    // Simulate posterior distributions using Beta distribution
    const simulations = 100000;
    let variantWins = 0;

    for (let i = 0; i < simulations; i++) {
      const controlRate = this.sampleBeta(
        control.conversions + 1,
        control.visitors - control.conversions + 1
      );
      const variantRate = this.sampleBeta(
        variant.conversions + 1,
        variant.visitors - variant.conversions + 1
      );

      if (variantRate > controlRate) {
        variantWins++;
      }
    }

    return {
      probabilityVariantIsBetter: variantWins / simulations,
      controlRate: control.conversions / control.visitors,
      variantRate: variant.conversions / variant.visitors,
      lift: ((variant.conversions / variant.visitors) /
             (control.conversions / control.visitors) - 1) * 100
    };
  }

  private sampleBeta(alpha: number, beta: number): number {
    // Marsaglia-Tsang method for Beta sampling
    const u1 = Math.random();
    const u2 = Math.random();
    const x = Math.pow(u1, 1 / alpha);
    const y = Math.pow(u2, 1 / beta);
    return x / (x + y);
  }
}

Shadow Testing: Test with Production Traffic

Shadow testing (dark launching) sends production traffic to a new service without affecting users:

interface ShadowConfig {
  enabled: boolean;
  captureRate: number;  // 0.0 to 1.0
  shadowService: string;
  timeout: number;      // milliseconds
}

class ShadowTester {
  async testEndpoint(
    originalRequest: Request,
    shadowConfig: ShadowConfig
  ): Promise<Response> {
    // Always serve the original response
    const originalResponse = await this.handle(originalRequest);

    // Sample traffic for shadow testing
    if (Math.random() < shadowConfig.captureRate) {
      // Fire and forget: shadow request with timeout
      this.shadowRequest(originalRequest, shadowConfig).catch(err => {
        console.error(`Shadow test failed: ${err.message}`);
        // Never fail the original request
      });
    }

    return originalResponse;
  }

  private async shadowRequest(
    originalRequest: Request,
    config: ShadowConfig
  ): Promise<void> {
    const start = performance.now();

    try {
      const shadowResponse = await fetch(config.shadowService, {
        method: originalRequest.method,
        headers: originalRequest.headers,
        body: originalRequest.body,
        signal: AbortSignal.timeout(config.timeout)
      });

      const latency = performance.now() - start;

      // Compare responses
      await this.recordComparison({
        statusMatch: originalResponse.status === shadowResponse.status,
        latencyMs: latency,
        shadowStatus: shadowResponse.status,
        originalStatus: originalResponse.status
      });
    } catch (error) {
      await this.recordError(error);
    }
  }

  private async recordComparison(data: any): Promise<void> {
    // Store in time-series database for analysis
    await metricsClient.increment("shadow_test.comparison", data);
  }
}

Synthetic Production Monitoring

Run synthetic transactions against production to detect issues before real users do:

class SyntheticMonitor {
  async runCheck(): Promise<CheckResult> {
    const checks = [
      this.checkHealthEndpoint(),
      this.checkCriticalFlow(),
      this.checkLatency(),
      this.checkDatabaseConnectivity(),
    ];

    const results = await Promise.allSettled(checks);
    const failures = results.filter(r => r.status === 'rejected');

    if (failures.length > 0) {
      await this.alertOnFailure(failures);
      return { passed: false, failures };
    }

    return { passed: true, failures: [] };
  }

  private async checkCriticalFlow(): Promise<void> {
    // Simulate a complete user journey
    const session = await this.createSession();
    const product = await this.searchProduct(session, "widget");
    const cart = await this.addToCart(session, product.id);
    const order = await this.checkout(session, cart.id);

    // If we got an order ID without errors, the flow works
    if (!order.id) {
      throw new Error("Critical checkout flow failed");
    }
  }

  private async checkLatency(): Promise<void> {
    const thresholds = {
      p50: { max: 200 },  // milliseconds
      p95: { max: 500 },
      p99: { max: 1000 },
    };

    const latencies = await this.getRecentLatencies("checkout", 1000);
    const sorted = [...latencies].sort((a, b) => a - b);

    if (sorted[Math.floor(sorted.length * 0.99)] > thresholds.p99.max) {
      throw new Error(`p99 latency exceeded ${thresholds.p99.max}ms`);
    }
  }

  private async alertOnFailure(failures: PromiseRejectedResult[]): Promise<void> {
    await alertingClient.send({
      severity: "critical",
      title: "Synthetic monitor failure",
      detail: `${failures.length} checks failed`,
      timestamp: new Date().toISOString(),
    });
  }
}

Blue-Green Deployment with Traffic Mirroring

Blue-green deployments minimize risk by running two identical environments:

# Kubernetes blue-green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      color: blue
  template:
    metadata:
      labels:
        app: myapp
        color: blue
    spec:
      containers:
        - name: app
          image: myapp:v2.0.0  # New version
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
      color: green
  template:
    metadata:
      labels:
        app: myapp
        color: green
    spec:
      containers:
        - name: app
          image: myapp:v1.9.9  # Old version
---
# Traffic mirroring: send copy of real traffic to blue
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp-green
          weight: 100
      mirror:
        host: myapp-blue
      mirrorPercentage:
        value: 10.0  # Mirror 10% of traffic

Feature Flag Lifecycle Management

Flags accumulate technical debt if not cleaned up:

class FlagLifecycle {
  private flags: Map<string, FlagDefinition> = new Map();

  registerFlag(flag: FlagDefinition): void {
    flag.createdAt = new Date();
    flag.status = 'active';
    this.flags.set(flag.name, flag);
  }

  async enforceCleanup(): Promise<void> {
    const now = new Date();

    for (const [name, flag] of this.flags) {
      // Check expiration
      if (flag.expiresAt && now > flag.expiresAt) {
        if (flag.rolloutPercentage === 100) {
          // Fully rolled out — remove flag code
          await this.scheduleRemoval(flag);
        } else {
          // Expired but not fully rolled out — alert
          await this.alertExpiredFlag(flag);
        }
      }

      // Check staleness
      const age = now.getTime() - flag.createdAt.getTime();
      if (age > 90 * 24 * 60 * 60 * 1000) {  // 90 days
        await this.alertStaleFlag(flag, age);
      }
    }
  }

  private async scheduleRemoval(flag: FlagDefinition): Promise<void> {
    // Create a ticket/issue for flag removal
    await issueTracker.create({
      title: `Remove feature flag: ${flag.name}`,
      description: `Flag ${flag.name} is at 100% rollout and should be removed from codebase.`,
      labels: ['flag-cleanup', 'tech-debt'],
      priority: 'medium'
    });
  }

  async generateFlagReport(): Promise<FlagReport> {
    const total = this.flags.size;
    const active = [...this.flags.values()].filter(f => f.status === 'active').length;
    const stale = [...this.flags.values()].filter(f => {
      const age = new Date().getTime() - f.createdAt.getTime();
      return age > 90 * 24 * 60 * 60 * 1000;
    }).length;

    return {
      totalFlags: total,
      activeFlags: active,
      staleFlags: stale,
      cleanupRate: total > 0 ? ((total - stale) / total) * 100 : 100,
      oldestFlag: [...this.flags.entries()]
        .sort((a, b) => a[1].createdAt.getTime() - b[1].createdAt.getTime())[0]?.[0]
    };
  }
}

// Centralized flag management
const flagDashboard = {
  productionFlags: [
    { name: "new-checkout", rollout: 100, age: 45, status: "cleanup-ready" },
    { name: "dark-mode", rollout: 50, age: 120, status: "stale-review" },
    { name: "ai-recs", rollout: 10, age: 30, status: "active-experiment" },
    { name: "legacy-ui", rollout: 0, age: 200, status: "deprecated-remove" },
  ]
};

Production Canary Analysis with Prometheus

Automate canary promotion decisions with metrics-based analysis:

# Argo Rollouts analysis template
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: canary-analysis
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.99
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              status=~"2.."
            }[5m]))
            /
            sum(rate(http_requests_total{
              service="{{args.service-name}}"
            }[5m]))

    - name: error-rate
      interval: 1m
      successCondition: result[0] <= 0.01
      failureLimit: 2
      provider:
        prometheus:
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              status=~"5.."
            }[5m]))
            /
            sum(rate(http_requests_total{
              service="{{args.service-name}}"
            }[5m]))

    - name: latency-p99
      interval: 1m
      successCondition: result[0] <= 500
      failureLimit: 3
      provider:
        prometheus:
          query: |
            histogram_quantile(0.99,
              sum(rate(http_request_duration_seconds_bucket{
                service="{{args.service-name}}"
              }[5m])) by (le))

Key Takeaways

  • Feature flags - Toggle features without deploying
  • Canary releases - Test with small percentage first
  • Monitor metrics - Error rate, latency, business metrics
  • Automated rollback - React quickly to issues
  • Shadow testing - Validate new services with real traffic, zero user impact
  • Synthetic monitoring - Detect failures before users notice
  • Bayesian A/B testing - More intuitive than frequentist, faster decisions
  • Flag lifecycle - Clean up flags to prevent technical debt

External Resources

Comments

👍 Was this article helpful?