Introduction
Every codebase becomes legacy eventually. The challenge is managing it effectively while continuing to deliver business value. This guide covers strategies for understanding, improving, and modernizing legacy systems without causing disruptions.
Legacy code is code that’s difficult to change but still provides business value. The goal isn’t to rewrite everythingโit’s to make it maintainable.
Understanding Legacy Code
Characteristics
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Legacy Code Characteristics โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โข No tests or few tests โ
โ โข Poor documentation โ
โ โข Spaghetti dependencies โ
โ โข Hardcoded values โ
โ โข Duplicated logic โ
โ โข Unknown business rules โ
โ โข No automated deployment โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Initial Assessment
# Code quality assessment checklist
assessment = {
"test_coverage": {
"critical": "0-20%",
"poor": "20-40%",
"moderate": "40-70%",
"good": "70-90%",
"excellent": "90%+"
},
"dependencies": {
"outdated": [],
"vulnerable": [],
"unused": []
},
"complexity": {
"files_per_module": "avg files",
"lines_per_file": "avg lines",
"cyclomatic_complexity": "avg complexity"
},
"documentation": {
"api_docs": "complete/partial/none",
"code_comments": "comprehensive/partial/none"
}
}
Strangler Fig Pattern
Gradual Migration
# Step 1: Set up router/facade
class LegacyRouter:
def __init__(self, old_system, new_system):
self.old = old_system
self.new = new_system
def route(self, request):
# Route to new or old based on feature flags
if feature_flags.is_enabled('new_checkout'):
return self.new.handle(request)
return self.old.handle(request)
# Step 2: Run in parallel
async def parallel_test(request):
# Send to both systems
old_result = await call_legacy(request)
new_result = await call_new(request)
# Compare results
if old_result != new_result:
log_mismatch(request, old_result, new_result)
# Return old result to client
return old_result
# Step 3: Gradual traffic shift
async def migrate_traffic(request):
percentage = feature_flags.get_percentage('new_checkout')
if random.random() * 100 < percentage:
return await call_new(request)
return await call_legacy(request)
Test Coverage Strategies
Characterization Tests
# Writing tests for existing code
def test_order_calculate_total():
"""Characterize existing behavior before refactoring."""
order = Order()
# Add items
order.add_item("Product A", 2, 10.00)
order.add_item("Product B", 1, 20.00)
# This is the CURRENT behavior - even if it seems wrong
result = order.calculate_total()
# Document what it actually does
# 2 * 10 + 1 * 20 = 40, but applies 10% discount
expected = 36.00
assert result == expected
Approval Testing
# Golden master testing
import subprocess
import filecmp
def test_output_stability():
"""Ensure output doesn't change unexpectedly."""
# Run with known input
result = subprocess.run(
["python", "legacy_script.py", "--input", "test_data.csv"],
capture_output=True
)
# Compare to known good output
assert filecmp.cmp("output.txt", "golden_output.txt")
Refactoring Safely
Boy Scout Rule
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Boy Scout Rule โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ "Always leave the code better than you found it" โ
โ โ
โ Examples: โ
โ โข Rename unclear variable names โ
โ โข Extract obvious methods โ
โ โข Add missing docstrings โ
โ โข Fix formatting โ
โ โข Remove dead code โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Incremental Refactoring
# Before: Long method
def process_order(order_data):
# Validate order
if not order_data.get('customer_id'):
return None
if not order_data.get('items'):
return None
# Calculate total
total = 0
for item in order_data['items']:
total += item['price'] * item['quantity']
# Apply discounts
if total > 100:
total *= 0.9
# Create order
order = Order(
customer_id=order_data['customer_id'],
total=total,
items=order_data['items']
)
# Save
db.save(order)
# Send confirmation
send_email(order.customer_id, "Order confirmed")
return order
# After: Extracted methods
def process_order(order_data):
validate_order(order_data)
order = create_order(order_data)
save_order(order)
send_confirmation(order)
return order
Feature Flags for Legacy Work
# Control refactoring with feature flags
feature_flags = {
"refactored_payment": False,
"refactored_order": False,
"new_database_schema": False
}
def calculate_total(order):
if feature_flags.get("refactored_order"):
return calculate_total_new(order)
return calculate_total_old(order)
def calculate_total_new(order):
"""New implementation using proper domain model."""
return order.subtotal + order.tax - order.discounts
def calculate_total_old(order):
"""Original implementation for comparison."""
total = 0
for item in order.items:
total += item.price * item.quantity
if total > 100:
total *= 0.9
return total
Documentation
Living Documentation
# Document business rules discovered
"""
Business Rules - Order Processing (discovered 2026-03-12)
========================================================
1. Orders with total > $100 receive 10% discount
- Confirmed by: Analysis of calculate_total() method
- Test case: order_total_100_dollars test
2. Discounts don't stack
- Confirmed by: No loop for multiple discounts
- Test case: multiple_discounts_test
3. Orders require at least one item
- Confirmed by: Validation check
- Test case: empty_order_rejected
Contributors: Jane (analysis), John (testing)
"""
# Code annotation
def calculate_total(order):
"""
Calculate order total.
Business Rule: Orders over $100 get 10% discount.
This discount does NOT stack with other discounts.
Args:
order: Order with items
Returns:
Decimal total after discounts
"""
Building Tests
Test Pyramid for Legacy
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Test Pyramid for Legacy Code โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโ โ
โ โ E2E โ Few - expensive โ
โ โ Tests โ โ
โ โโโโโโโฌโโโโโโ โ
โ โโโโโโโโโดโโโโโโโโ โ
โ โ Integration โ Some - medium โ
โ โ Tests โ โ
โ โโโโโโโโโฌโโโโโโโโ โ
โ โโโโโโโโโโโโโดโโโโโโโโโโโโ โ
โ โ Unit Tests โ Many - fast โ
โ โ (add as you refactor) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Modernization Strategies
Database Migration
# Parallel write
async def save_entity(entity):
# Write to both databases
await old_db.save(entity)
await new_db.save(entity)
# Verify consistency
old_result = await old_db.get(entity.id)
new_result = await new_db.get(entity.id)
if old_result != new_result:
log_inconsistency(entity.id)
# Gradual migration
async def read_entity(entity_id):
# Check if migrated
if await new_db.exists(entity_id):
return await new_db.get(entity_id)
# Fall back to old
return await old_db.get(entity_id)
Best Practices
- Don’t rewrite: Rewrite is risky; refactor incrementally
- Test first: Add tests before changing
- Use feature flags: Control rollout
- Measure twice, cut once: Understand before changing
- Document discoveries: Record what you learn
- One change at a time: Small, safe changes
Conclusion
Legacy code requires patience, careful analysis, and incremental improvement. By understanding the existing behavior, adding tests, and refactoring safely, you can modernize systems while maintaining business continuity.
Comments