Introduction
Testing in production sounds risky, but it’s often the only way to find real-world bugs. Modern practices like feature flags and canary releases let you test safely in production with minimal risk.
Why Test in Production?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Testing in Production Benefits โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โ Real users, real data, real conditions โ
โ โ Find issues CI can't catch โ
โ โ Faster feedback loops โ
โ โ A/B test new features โ
โ โ Instant rollback if issues occur โ
โ โ
โ Risks: โ
โ โ Users affected by bugs โ
โ โ Potential service disruptions โ
โ โ
โ Solution: FEATURE FLAGS + CANARY โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Feature Flags
Basic Implementation
// Feature flag service
class FeatureFlags {
private flags = new Map<string, boolean>();
enable(feature: string) {
this.flags.set(feature, true);
}
disable(feature: string) {
this.flags.set(feature, false);
}
isEnabled(feature: string): boolean {
return this.flags.get(feature) ?? false;
}
}
const flags = new FeatureFlags();
// Usage in code
if (flags.isEnabled('new-dashboard')) {
return <NewDashboard />;
} else {
return <LegacyDashboard />;
}
With Providers
// Use LaunchDarkly, Split, or Statsig
import { LaunchDarkly } from 'launchdarkly-node-server-sdk';
const client = LaunchDarkly.init(process.env.LD_KEY!);
// Check feature flag
async function checkFlag(userId: string, flag: string) {
const value = await client.variation(flag, {
key: userId
}, false);
return value;
}
// In Express route
app.get('/dashboard', async (req, res) => {
const userId = req.user.id;
const useNewDashboard = await checkFlag(userId, 'new-dashboard');
if (useNewDashboard) {
return res.render('dashboard-new');
}
return res.render('dashboard-legacy');
});
Gradual Rollout
// Percentage rollout
async function isInRollout(userId: string, percentage: number): Promise<boolean> {
// Simple hash-based deterministic selection
const hash = hashCode(userId);
const bucket = Math.abs(hash) % 100;
return bucket < percentage;
}
// Usage
const rollout = await isInRollout(userId, 10); // 10% rollout
if (rollout) {
enableFeature('new-checkout');
}
Canary Deployments
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Canary Deployment โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Load Balancer โ
โ โ โ
โ โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโผโโโโโ โโโโโโผโโโโโ โโโโโโผโโโโโ โ
โ โ Canary โ โ Canary โ โ Main โ โ
โ โ v2 โ โ v2 โ โ v1 โ โ
โ โ 10% โ โ 10% โ โ 80% โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ โ
โ โผ โ
โ Monitoring & Metrics โ
โ โ โ
โ Promote or Rollback โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Kubernetes Canary
# kubernetes/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
version: canary
template:
spec:
containers:
- name: myapp
image: myapp:v2
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
Argo Rollouts
# argo-rollouts.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 10m}
- setWeight: 30
- pause: {duration: 10m}
- setWeight: 100
canaryMetadata:
labels:
role: canary
stableMetadata:
labels:
role: stable
Monitoring for Issues
Key Metrics
# Metrics to monitor
metrics:
- "Error rate (should stay low)"
- "Latency (p50, p95, p99)"
- "HTTP status codes (4xx, 5xx)"
- "Business metrics (conversions, signups)"
- "User feedback"
Automated Rollback
// Automated canary analysis
async function analyzeCanary() {
const metrics = await getMetrics('canary');
const errorRate = metrics.errors / metrics.requests;
const latencyP99 = metrics.latency.p99;
// Rollback if error rate > 1%
if (errorRate > 0.01) {
await rollbackCanary();
await alert('Canary rolled back - error rate exceeded 1%');
return;
}
// Rollback if latency increased > 50%
if (latencyP99 > baselineLatency * 1.5) {
await rollbackCanary();
await alert('Canary rolled back - latency degradation');
return;
}
// Promote if metrics look good
await promoteCanary();
}
Key Takeaways
- Feature flags - Toggle features without deploying
- Canary releases - Test with small percentage first
- Monitor metrics - Error rate, latency, business metrics
- Automated rollback - React quickly to issues
Comments