Introduction
Feature flags transform how you ship software. From simple on/off switches to sophisticated experimentation platforms, they enable safe releases and data-driven product decisions.
Key Statistics:
- Companies using feature flags deploy 30x more frequently
- 73% of enterprises use feature management
- A/B testing improves conversion by 20-30%
- Feature flags reduce rollback time from hours to seconds
What Are Feature Flags and Why You Need Them
A feature flag (also called a feature toggle) is a technique that allows you to change your application’s behavior without deploying new code. Think of it as a switch that controls whether a feature is visible or active for your users.
The Core Problem Feature Flags Solve
Without feature flags, you face a difficult choice:
- Deploy big features late: Risky, hard to rollback, all-or-nothing
- Deploy small features often: Slow progress, complex branching
Feature flags eliminate this trade-off by decoupling deployment from release.
Real-World Use Cases
Gradual Rollout: Release a feature to 1% of users first, then 5%, 10%, 50%, and finally 100%. If something breaks, you flip the switch back instead of deploying a fix.
A/B Testing: Show different versions of a feature to different users and measure which performs better. This is how companies optimize conversion rates and user experience.
Kill Switches: Emergency off-ramps for features that cause issues in production. No more frantic deployments to fix problems.
Canary Releases: Test new infrastructure or database changes on a small subset of users before full rollout.
Dark Launches: Deploy features before they’re ready for public use, allowing internal testing while keeping them hidden from customers.
Feature Flag Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Feature Flag System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ SDKs โ โ Dashboard โ โ Analytics โ โ
โ โ (Web, โโโโโถโ (Manage, โโโโโถโ (Track, โ โ
โ โ Mobile) โ โ Monitor) โ โ Analyze) โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Flag Service โ โ
โ โ (Evaluate, Rules) โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Redis โ โ Database โ โ Config โ โ
โ โ Cache โ โ (Rules) โ โ (Static) โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Core Concepts and Terminology
Before diving into implementation, let’s clarify the key concepts that form the foundation of feature flag systems.
Feature Flag
A feature flag is a switch that controls whether a feature is enabled or disabled. It’s a boolean value (true/false) that determines if code should execute.
Example: A new checkout page feature flag might look like new_checkout_enabled = true.
Variant
In A/B testing, variants are different versions of a feature. Instead of just true/false, variants allow multiple options.
Example: A button color test might have variants: control (blue), variant_a (green), variant_b (red).
Context
Context is the information about the current user or request that determines which variant they should see. This includes user ID, plan, location, device, and any other relevant attributes.
Example Context:
{
"user_id": "user_123",
"tenant_id": "tenant_456",
"plan": "enterprise",
"country": "US",
"device": "mobile"
}
Rollout Percentage
The percentage of users who should see a feature. This enables gradual rollouts where you start with 1% and increase over time.
Example: 5% rollout means 5 out of every 100 users see the feature.
Targeting Rules
Specific conditions that determine who sees a feature. This goes beyond simple percentages to include user attributes.
Example: “Enable feature for enterprise users in the US with more than 100 employees.”
Evaluation
The process of determining which variant a user should see based on their context and the flag’s rules.
Example: User with ID “user_123” evaluates to “variant_a” based on consistent hashing.
Backing Out
The ability to quickly disable a feature without redeploying code. This is the emergency off-ramp that makes feature flags valuable.
Example: If a feature causes errors, flip the switch to false and the feature disappears instantly.
Implementation
Implementation
Simple Feature Flag Service
#!/usr/bin/env python3
"""Feature flag service."""
import json
import hashlib
from typing import Any, Optional
from datetime import datetime
class FeatureFlag:
"""Feature flag evaluation."""
def __init__(self, key: str, default: bool = False):
self.key = key
self.default = default
def evaluate(self, context: dict) -> bool:
"""Evaluate flag for given context."""
raise NotImplementedError
class SimpleFeatureFlag(FeatureFlag):
"""Simple on/off flag."""
def __init__(self, key: str, enabled: bool = False):
super().__init__(key)
self.enabled = enabled
def evaluate(self, context: dict) -> bool:
return self.enabled
class PercentageFeatureFlag(FeatureFlag):
"""Percentage-based rollout."""
def __init__(self, key: str, percentage: int = 0):
super().__init__(key)
self.percentage = max(0, min(100, percentage))
def evaluate(self, context: dict) -> bool:
user_id = context.get('user_id', 'anonymous')
# Consistent hashing for user
hash_value = int(hashlib.md5(f"{self.key}:{user_id}".encode()).hexdigest(), 16)
bucket = hash_value % 100
return bucket < self.percentage
class TargetingFeatureFlag(FeatureFlag):
"""Targeting specific users/groups."""
def __init__(self, key: str, rules: dict):
super().__init__(key)
self.rules = rules
def evaluate(self, context: dict) -> bool:
# Check specific users
if 'users' in self.rules:
if context.get('user_id') in self.rules['users']:
return True
# Check user attributes
for attr, values in self.rules.get('attributes', {}).items():
if context.get(attr) in values:
return True
# Check percentage if no rules match
percentage = self.rules.get('percentage', 0)
hash_value = int(hashlib.md5(f"{self.key}:{context.get('user_id', 'anonymous')}".encode()).hexdigest(), 16)
bucket = hash_value % 100
return bucket < percentage
Evaluation Engine
#!/usr/bin/env python3
"""Feature flag evaluation engine."""
import json
from typing import Any, Dict, List
from datetime import datetime
class FeatureEngine:
"""Feature flag evaluation engine."""
def __init__(self):
self.flags = {}
self.cache = {}
def load_flags(self, config: Dict):
"""Load flags from configuration."""
for key, config in config.items():
flag_type = config.get('type', 'simple')
if flag_type == 'simple':
self.flags[key] = SimpleFeatureFlag(key, config.get('enabled', False))
elif flag_type == 'percentage':
self.flags[key] = PercentageFeatureFlag(key, config.get('percentage', 0))
elif flag_type == 'targeting':
self.flags[key] = TargetingFeatureFlag(key, config.get('rules', {}))
def is_enabled(self, key: str, context: Dict = None) -> bool:
"""Check if feature is enabled."""
context = context or {}
if key not in self.flags:
return False
return self.flags[key].evaluate(context)
def get_variant(self, key: str, context: Dict = None) -> str:
"""Get variant for A/B testing."""
variant_key = f"{key}:variant:{context.get('user_id', 'anonymous')}"
if variant_key in self.cache:
return self.cache[variant_key]
# Hash to get variant
hash_value = int(hashlib.md5(f"{key}:{context.get('user_id')}".encode()).hexdigest(), 16)
variants = self.flags[key].variants if hasattr(self.flags[key], 'variants') else ['control', 'variant_a']
variant = variants[hash_value % len(variants)]
self.cache[variant_key] = variant
return variant
def track_event(self, key: str, context: Dict, event: str):
"""Track feature event for analytics."""
# Send to analytics
print(f"Event: {event} | Feature: {key} | User: {context.get('user_id')}")
Integration
#!/usr/bin/env python3
"""Feature flag middleware."""
class FeatureMiddleware:
"""FastAPI/Starlette middleware."""
def __init__(self, app, feature_engine):
self.app = app
self.engine = feature_engine
async def __call__(self, scope, receive, send):
if scope['type'] == 'http':
# Extract context from request
context = self.get_context(scope)
# Add feature flags to scope
scope['features'] = {
key: self.engine.is_enabled(key, context)
for key in self.engine.flags.keys()
}
await self.app(scope, receive, send)
def get_context(self, scope) -> dict:
"""Extract context from request."""
headers = dict(scope.get('headers', []))
return {
'user_id': headers.get(b'x-user-id', b'anonymous').decode(),
'email': headers.get(b'x-user-email', b'').decode(),
'plan': headers.get(b'x-user-plan', b'free').decode(),
'ip': scope.get('client', ('', ''))[0]
}
# Usage in FastAPI
@app.get("/dashboard")
async def dashboard(request: Request):
if request.app.state.features.get('new_dashboard'):
return NewDashboard()
return OldDashboard()
@app.get("/checkout")
async def checkout(request: Request):
variant = request.app.state.feature_engine.get_variant(
'checkout_redesign',
{'user_id': get_user_id(request)}
)
if variant == 'variant_a':
return CheckoutVariantA()
return CheckoutControl()
A/B Testing
Experimentation Platform
#!/usr/bin/env python3
"""A/B testing implementation."""
import random
from typing import Dict, List, Optional
from datetime import datetime, timedelta
class Experiment:
"""A/B test experiment."""
def __init__(self, name: str, variants: List[Dict],
allocation: Dict[str, int] = None):
self.name = name
self.variants = variants
self.allocation = allocation or {v['name']: 100 // len(variants) for v in variants}
def assign_variant(self, user_id: str) -> str:
"""Assign user to variant."""
# Consistent hashing
hash_value = hash(f"{self.name}:{user_id}") % 100
cumulative = 0
for variant in self.variants:
cumulative += self.allocation[variant['name']]
if hash_value < cumulative:
return variant['name']
return self.variants[0]['name']
class ExperimentTracker:
"""Track experiment metrics."""
def __init__(self, db):
self.db = db
def track_impression(self, experiment: str, variant: str, user_id: str):
"""Track experiment impression."""
self.db.execute("""
INSERT INTO experiment_impressions (experiment, variant, user_id, timestamp)
VALUES (?, ?, ?, ?)
""", [experiment, variant, user_id, datetime.utcnow()])
def track_conversion(self, experiment: str, variant: str,
user_id: str, metric: str, value: float):
"""Track conversion."""
self.db.execute("""
INSERT INTO experiment_conversions (experiment, variant, user_id, metric, value, timestamp)
VALUES (?, ?, ?, ?, ?, ?)
""", [experiment, variant, user_id, metric, value, datetime.utcnow()])
def get_results(self, experiment: str) -> Dict:
"""Calculate experiment results."""
impressions = self.db.query("""
SELECT variant, COUNT(*) as impressions
FROM experiment_impressions
WHERE experiment = ?
GROUP BY variant
""", [experiment])
conversions = self.db.query("""
SELECT variant,
COUNT(*) as conversions,
SUM(value) as total_value
FROM experiment_conversions
WHERE experiment = ?
GROUP BY variant
""", [experiment])
# Calculate statistics
results = {}
for imp in impressions:
variant = imp['variant']
conv = next((c for c in conversions if c['variant'] == variant), {})
results[variant] = {
'impressions': imp['impressions'],
'conversions': conv.get('conversions', 0),
'conversion_rate': conv.get('conversions', 0) / imp['impressions'] * 100 if imp['impressions'] > 0 else 0,
'total_value': conv.get('total_value', 0)
}
return results
Statistical Significance
#!/usr/bin/env python3
"""Statistical significance testing for A/B experiments."""
import math
from scipy import stats
class StatisticalSignificance:
"""Calculate statistical significance for A/B tests."""
def calculate_z_score(self, control: Dict, variant: Dict) -> float:
"""Calculate Z-score for A/B test."""
# Control group
control_conversions = control['conversions']
control_impressions = control['impressions']
control_rate = control_conversions / control_impressions if control_impressions > 0 else 0
# Variant group
variant_conversions = variant['conversions']
variant_impressions = variant['impressions']
variant_rate = variant_conversions / variant_impressions if variant_impressions > 0 else 0
# Pooled standard error
pooled_rate = (control_conversions + variant_conversions) / (control_impressions + variant_impressions)
standard_error = math.sqrt(
pooled_rate * (1 - pooled_rate) *
(1/control_impressions + 1/variant_impressions)
)
if standard_error == 0:
return 0
# Z-score
z_score = (variant_rate - control_rate) / standard_error
return z_score
def calculate_p_value(self, z_score: float) -> float:
"""Calculate p-value from Z-score."""
return 2 * (1 - stats.norm.cdf(abs(z_score)))
def is_significant(self, p_value: float, alpha: float = 0.05) -> bool:
"""Check if result is statistically significant."""
return p_value < alpha
def calculate_confidence_interval(self, rate: float, n: int,
confidence: float = 0.95) -> Dict:
"""Calculate confidence interval for conversion rate."""
z = stats.norm.ppf(confidence)
standard_error = math.sqrt(rate * (1 - rate) / n)
margin_of_error = z * standard_error
return {
'lower': max(0, rate - margin_of_error),
'upper': min(1, rate + margin_of_error),
'margin_of_error': margin_of_error
}
Sample Size Calculation
#!/usr/bin/env python3
"""Calculate required sample size for A/B tests."""
import math
from scipy import stats
class SampleSizeCalculator:
"""Calculate sample size for A/B tests."""
def calculate_sample_size(self,
baseline_rate: float,
min_detectable_effect: float,
alpha: float = 0.05,
power: float = 0.8) -> int:
"""Calculate required sample size per variant."""
# Z-scores for significance level and power
z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = stats.norm.ppf(power)
# Pooled rate
control_rate = baseline_rate
variant_rate = baseline_rate * (1 + min_detectable_effect)
pooled_rate = (control_rate + variant_rate) / 2
# Sample size formula
sample_size = (
(z_alpha + z_beta) ** 2 *
(control_rate * (1 - control_rate) + variant_rate * (1 - variant_rate))
) / (variant_rate - control_rate) ** 2
return math.ceil(sample_size)
def calculate_duration(self, sample_size: int, daily_traffic: int) -> int:
"""Calculate test duration in days."""
return math.ceil(sample_size / daily_traffic)
Gradual Rollout Strategies
Linear Rollout
#!/usr/bin/env python3
"""Linear rollout strategy."""
from datetime import datetime, timedelta
class LinearRollout:
"""Linear rollout strategy."""
def __init__(self, start_percentage: int = 1,
end_percentage: int = 100,
duration_hours: int = 24):
self.start_percentage = start_percentage
self.end_percentage = end_percentage
self.duration_hours = duration_hours
self.start_time = datetime.utcnow()
def get_current_percentage(self) -> int:
"""Get current rollout percentage based on time."""
elapsed = (datetime.utcnow() - self.start_time).total_seconds() / 3600
total_hours = self.duration_hours
if elapsed >= total_hours:
return self.end_percentage
progress = elapsed / total_hours
percentage = self.start_percentage + (self.end_percentage - self.start_percentage) * progress
return int(percentage)
def should_enable(self, user_id: str) -> bool:
"""Check if feature should be enabled for user."""
current_percentage = self.get_current_percentage()
hash_value = int(hashlib.md5(f"rollout:{user_id}".encode()).hexdigest(), 16)
bucket = hash_value % 100
return bucket < current_percentage
S-Curve Rollout
#!/usr/bin/env python3
"""S-curve rollout strategy (slow start, fast middle, slow end)."""
from datetime import datetime, timedelta
import math
class SCurveRollout:
"""S-curve rollout strategy."""
def __init__(self, start_percentage: int = 1,
end_percentage: int = 100,
duration_hours: int = 72,
midpoint_percentage: int = 50):
self.start_percentage = start_percentage
self.end_percentage = end_percentage
self.duration_hours = duration_hours
self.midpoint_percentage = midpoint_percentage
self.start_time = datetime.utcnow()
def get_current_percentage(self) -> int:
"""Get current rollout percentage using S-curve."""
elapsed = (datetime.utcnow() - self.start_time).total_seconds() / 3600
total_hours = self.duration_hours
if elapsed >= total_hours:
return self.end_percentage
# S-curve formula
progress = elapsed / total_hours
midpoint = self.midpoint_percentage / 100
# Logistic function
k = 10 # Steepness
percentage = self.start_percentage + (
(self.end_percentage - self.start_percentage) /
(1 + math.exp(-k * (progress - midpoint)))
)
return int(percentage)
Canary Rollout
#!/usr/bin/env python3
"""Canary rollout strategy."""
class CanaryRollout:
"""Canary rollout strategy with health checks."""
def __init__(self, initial_percentage: int = 1,
increment_percentage: int = 5,
health_check_interval: int = 300):
self.current_percentage = initial_percentage
self.increment_percentage = increment_percentage
self.health_check_interval = health_check_interval
self.last_check = datetime.utcnow()
self.errors = []
def should_enable(self, user_id: str) -> bool:
"""Check if feature should be enabled for user."""
hash_value = int(hashlib.md5(f"canary:{user_id}".encode()).hexdigest(), 16)
bucket = hash_value % 100
return bucket < self.current_percentage
def record_error(self, error: str):
"""Record an error for canary analysis."""
self.errors.append({
'timestamp': datetime.utcnow(),
'error': error
})
def should_increase_rollout(self) -> bool:
"""Check if rollout should increase based on health."""
if len(self.errors) < 10:
return True
# Check error rate in last hour
recent_errors = [
e for e in self.errors
if (datetime.utcnow() - e['timestamp']).total_seconds() < 3600
]
error_rate = len(recent_errors) / len(self.errors)
return error_rate < 0.01 # Less than 1% error rate
def increase_rollout(self):
"""Increase rollout percentage."""
if self.current_percentage < 100:
self.current_percentage = min(
self.current_percentage + self.increment_percentage,
100
)
Best Practices and Anti-Patterns
Good Patterns
1. Use Feature Flags for All New Features
- Deploy behind flags, flip when ready
- No direct production deployments of new features
2. Clean Up Old Flags
- Remove flags after 90 days
- Document flag lifecycle
- Use flag naming conventions
3. Monitor Flag Usage
- Track which flags are enabled
- Monitor performance impact
- Alert on unusual flag states
4. Test Flag Logic Thoroughly
- Unit test flag evaluation
- Test edge cases
- Verify rollback works
Bad Patterns
1. Flag Sprawl
- โ Creating flags for every small change
- โ Use flags for major features only
- โ Clean up flags regularly
2. Flag Debt
- โ Leaving flags in code for years
- โ Set expiration dates
- โ Remove flags after rollout
3. No Monitoring
- โ Not tracking flag performance
- โ Monitor error rates
- โ Track user feedback
4. Complex Flag Logic
- โ Deeply nested flag conditions
- โ Keep flag logic simple
- โ Use targeting rules instead
External Resources
Related Articles
Conclusion
Feature flags are a fundamental technique for modern SaaS development. They enable safe deployments, data-driven decisions, and flexible release management. By implementing feature flags with gradual rollouts and A/B testing, you can ship faster while maintaining quality and reliability.
Key takeaways:
- Feature flags decouple deployment from release - Deploy code anytime, release when ready
- Gradual rollouts reduce risk - Start with 1%, increase over time, rollback instantly if needed
- A/B testing drives optimization - Make data-driven decisions about feature design
- Consistent hashing ensures stability - Users see the same variant throughout the experiment
- Statistical significance matters - Run tests long enough to get reliable results
- Clean up flags regularly - Flag sprawl creates technical debt and confusion
Start with simple on/off flags, then add percentage-based rollouts, and eventually implement full A/B testing. The investment in feature flag infrastructure pays dividends in faster iteration, better decision-making, and more reliable releases.
Comments