Introduction
Testing in production sounds risky, but it’s often the only way to find real-world bugs. Modern practices like feature flags and canary releases let you test safely in production with minimal risk.
Why Test in Production?
┌─────────────────────────────────────────────────────────────┐
│ Testing in Production Benefits │
├─────────────────────────────────────────────────────────────┤
│ │
│ ✓ Real users, real data, real conditions │
│ ✓ Find issues CI can't catch │
│ ✓ Faster feedback loops │
│ ✓ A/B test new features │
│ ✓ Instant rollback if issues occur │
│ │
│ Risks: │
│ ✗ Users affected by bugs │
│ ✗ Potential service disruptions │
│ │
│ Solution: FEATURE FLAGS + CANARY │
│ │
└─────────────────────────────────────────────────────────────┘
Feature Flags
Basic Implementation
// Feature flag service
class FeatureFlags {
private flags = new Map<string, boolean>();
enable(feature: string) {
this.flags.set(feature, true);
}
disable(feature: string) {
this.flags.set(feature, false);
}
isEnabled(feature: string): boolean {
return this.flags.get(feature) ?? false;
}
}
const flags = new FeatureFlags();
// Usage in code
if (flags.isEnabled('new-dashboard')) {
return <NewDashboard />;
} else {
return <LegacyDashboard />;
}
With Providers
// Use LaunchDarkly, Split, or Statsig
import { LaunchDarkly } from 'launchdarkly-node-server-sdk';
const client = LaunchDarkly.init(process.env.LD_KEY!);
// Check feature flag
async function checkFlag(userId: string, flag: string) {
const value = await client.variation(flag, {
key: userId
}, false);
return value;
}
// In Express route
app.get('/dashboard', async (req, res) => {
const userId = req.user.id;
const useNewDashboard = await checkFlag(userId, 'new-dashboard');
if (useNewDashboard) {
return res.render('dashboard-new');
}
return res.render('dashboard-legacy');
});
Gradual Rollout
// Percentage rollout
async function isInRollout(userId: string, percentage: number): Promise<boolean> {
// Simple hash-based deterministic selection
const hash = hashCode(userId);
const bucket = Math.abs(hash) % 100;
return bucket < percentage;
}
// Usage
const rollout = await isInRollout(userId, 10); // 10% rollout
if (rollout) {
enableFeature('new-checkout');
}
Canary Deployments
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Canary Deployment │
├─────────────────────────────────────────────────────────────┤
│ │
│ Load Balancer │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │
│ │ Canary │ │ Canary │ │ Main │ │
│ │ v2 │ │ v2 │ │ v1 │ │
│ │ 10% │ │ 10% │ │ 80% │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │
│ └─────────────┼─────────────┘ │
│ ▼ │
│ Monitoring & Metrics │
│ │ │
│ Promote or Rollback │
│ │
└─────────────────────────────────────────────────────────────┘
Kubernetes Canary
# kubernetes/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
version: canary
template:
spec:
containers:
- name: myapp
image: myapp:v2
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
Argo Rollouts
# argo-rollouts.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 10m}
- setWeight: 30
- pause: {duration: 10m}
- setWeight: 100
canaryMetadata:
labels:
role: canary
stableMetadata:
labels:
role: stable
Monitoring for Issues
Key Metrics
# Metrics to monitor
metrics:
- "Error rate (should stay low)"
- "Latency (p50, p95, p99)"
- "HTTP status codes (4xx, 5xx)"
- "Business metrics (conversions, signups)"
- "User feedback"
Automated Rollback
// Automated canary analysis
async function analyzeCanary() {
const metrics = await getMetrics('canary');
const errorRate = metrics.errors / metrics.requests;
const latencyP99 = metrics.latency.p99;
// Rollback if error rate > 1%
if (errorRate > 0.01) {
await rollbackCanary();
await alert('Canary rolled back - error rate exceeded 1%');
return;
}
// Rollback if latency increased > 50%
if (latencyP99 > baselineLatency * 1.5) {
await rollbackCanary();
await alert('Canary rolled back - latency degradation');
return;
}
// Promote if metrics look good
await promoteCanary();
}
Key Takeaways
- Feature flags - Toggle features without deploying
- Canary releases - Test with small percentage first
- Monitor metrics - Error rate, latency, business metrics
- Automated rollback - React quickly to issues
Comments