Introduction
System design is the art and science of architecting software systems that can handle millions of users, process terabytes of data, and remain available 99.99% of the time. Whether you’re preparing for technical interviews at top tech companies or building production systems that serve real users, understanding system design fundamentals is essential for every software engineer.
In 2026, distributed systems have become the norm rather than the exception. Understanding how to design systems that scale horizontally, handle failures gracefully, and maintain consistency across geographic regions has become core knowledge for developers at all levels. This guide provides a comprehensive foundation in system design, covering the key concepts, patterns, and trade-offs that every engineer should understand.
Core Concepts in System Design
Scalability
Scalability refers to a system’s ability to handle increased load by adding resources. There are two primary approaches:
Vertical Scaling (Scale Up): Adding more power to existing machinesโmore CPU, RAM, or storage. Simpler to implement but has hard limits and creates single points of failure.
Horizontal Scaling (Scale Out): Adding more machines to handle increased load. More complex but virtually unlimited scaling potential. This is the preferred approach for large-scale systems.
# Vertical scaling example - simply adding more resources
class VerticalScale:
def __init__(self, cpu_cores, ram_gb):
self.cpu_cores = cpu_cores
self.ram_gb = ram_gb
def scale_up(self, additional_cores, additional_ram):
self.cpu_cores += additional_cores
self.ram_gb += additional_ram
# Horizontal scaling example - adding more instances
class HorizontalScale:
def __init__(self):
self.instances = []
def add_instance(self, instance):
self.instances.append(instance)
def remove_instance(self, instance_id):
self.instances = [i for i in self.instances if i.id != instance_id]
def distribute_request(self, request):
# Simple round-robin load balancing
instance = self.instances[request.id % len(self.instances)]
return instance.handle(request)
Load Balancing
Load balancers distribute incoming traffic across multiple servers, ensuring no single server becomes overwhelmed. They also provide fault tolerance by detecting failed servers and routing traffic to healthy ones.
Types of Load Balancers:
- Layer 4 (Transport): Routes based on IP address and TCP/UDP port
- Layer 7 (Application): Routes based on HTTP content, headers, or cookies
class LoadBalancer:
def __init__(self, servers, algorithm='round_robin'):
self.servers = servers
self.algorithm = algorithm
self.current_index = 0
self.server_health = {s: True for s in servers}
def get_server(self):
healthy_servers = [s for s in self.servers if self.server_health[s]]
if self.algorithm == 'round_robin':
server = healthy_servers[self.current_index % len(healthy_servers)]
self.current_index += 1
return server
elif self.algorithm == 'least_connections':
return min(healthy_servers, key=lambda s: s.active_connections)
elif self.algorithm == 'weighted':
weights = self.get_server_weights()
return self.weighted_selection(healthy_servers, weights)
def mark_unhealthy(self, server):
self.server_health[server] = False
def mark_healthy(self, server):
self.server_health[server] = True
Caching
Caching stores frequently accessed data in fast storage to reduce latency and database load. Effective caching can improve response times by orders of magnitude.
Caching Strategies:
# Cache-Aside Pattern
class CacheAside:
def __init__(self, cache, database):
self.cache = cache
self.database = database
def get(self, key):
# Check cache first
value = self.cache.get(key)
if value is not None:
return value
# Cache miss - load from database
value = self.database.get(key)
# Store in cache for next time
if value is not None:
self.cache.set(key, value, ttl=3600)
return value
def set(self, key, value):
self.database.set(key, value)
self.cache.set(key, value, ttl=3600)
def delete(self, key):
self.database.delete(key)
self.cache.delete(key)
# Write-Through Pattern
class WriteThrough:
def __init__(self, cache, database):
self.cache = cache
self.database = database
def write(self, key, value):
# Write to both cache and database simultaneously
self.cache.set(key, value)
self.database.set(key, value)
# Write-Behind Pattern
class WriteBehind:
def __init__(self, cache, database, queue):
self.cache = cache
self.database = database
self.queue = queue
def write(self, key, value):
# Write to cache first, queue for async database write
self.cache.set(key, value)
self.queue.enqueue({'key': key, 'value': value})
def flush(self):
# Periodically write queued items to database
while not self.queue.is_empty():
item = self.queue.dequeue()
self.database.set(item['key'], item['value'])
CAP Theorem
The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously: Consistency, Availability, and Partition tolerance.
- Consistency (C): All nodes see the same data at the same time
- Availability (A): Every request receives a response (even if stale)
- Partition tolerance (P): System continues to operate despite network failures
Since network partitions are inevitable in distributed systems, you must choose between CP (consistency during partitions) and AP (availability during partitions).
# Consistency vs Availability Trade-off Example
class ConsistentSystem:
"""CP: Prioritizes consistency over availability"""
def get(self, key):
# Ensure all replicas agree before returning
results = self.read_from_all_replicas(key)
if self.all_equal(results):
return results[0]
raise InconsistentStateError("Replicas out of sync")
def set(self, key, value):
# Write to quorum before confirming
self.write_to_quorum(key, value)
class AvailableSystem:
"""AP: Prioritizes availability over consistency"""
def get(self, key):
# Return from any available replica
for replica in self.replicas:
if replica.is_available():
return replica.get(key)
raise NoAvailableReplicaError()
def set(self, key, value):
# Write to any available replica
for replica in self.replicas:
if replica.is_available():
replica.set(key, value)
Database Design
SQL vs NoSQL
SQL Databases (PostgreSQL, MySQL):
- Structured data with schemas
- ACID transactions
- Complex queries with JOINs
- Vertical scaling
- Use when: Data relationships are complex, transactions required
NoSQL Databases (MongoDB, Cassandra, DynamoDB):
- Flexible schemas
- Horizontal scaling
- Eventual consistency
- High throughput writes
- Use when: Scale is critical, flexible schema needed
# Example: Choosing database based on use case
# E-commerce platform with complex transactions
class EcommerceDatabase:
def __init__(self):
# Use SQL for transactional data
self.order_db = SQLDatabase('postgresql://...')
self.inventory_db = SQLDatabase('postgresql://...')
# Use NoSQL for flexible product attributes
self.product_catalog = NoSQLDatabase('mongodb://...')
# Use cache for hot items
self.hot_products = RedisCache()
def create_order(self, order_data):
# SQL transaction ensures atomicity
with self.order_db.transaction():
order_id = self.order_db.insert('orders', order_data)
for item in order_data['items']:
self.inventory_db.update(
'inventory',
{'quantity': -item['quantity']},
{'product_id': item['product_id']}
)
return order_id
def get_product(self, product_id):
# Check cache first
product = self.hot_products.get(f'product:{product_id}')
if product:
return product
# Fallback to NoSQL
product = self.product_catalog.find_one({'_id': product_id})
# Cache for next request
if product:
self.hot_products.set(f'product:{product_id}', product)
return product
Database Replication
class ReplicationManager:
def __init__(self, primary, replicas):
self.primary = primary
self.replicas = replicas
def write(self, data):
# Write to primary
self.primary.write(data)
# Async replication to replicas
for replica in self.replicas:
asyncio.create_task(replica.replicate(data))
def read(self, key):
# Read from any replica for load distribution
replica = random.choice(self.replicas)
return replica.read(key)
Sharding
Sharding distributes data across multiple databases to enable horizontal scaling.
class ShardedDatabase:
def __init__(self, shard_count=4):
self.shards = [Database() for _ in range(shard_count)]
def _get_shard(self, key):
# Consistent hashing
hash_value = hash(key)
return self.shards[hash_value % len(self.shards)]
def write(self, key, value):
shard = self._get_shard(key)
shard.write(key, value)
def read(self, key):
shard = self._get_shard(key)
return shard.read(key)
Microservices Architecture
Service Communication
# Synchronous communication (REST/gRPC)
class UserService:
def __init__(self, http_client):
self.http_client = http_client
def get_user_with_orders(self, user_id):
# Call user service
user = self.http_client.get(f'/users/{user_id}')
# Call order service
orders = self.http_client.get(f'/orders?user_id={user_id}')
return {'user': user, 'orders': orders}
# Asynchronous communication (Message Queue)
class OrderService:
def __init__(self, message_queue):
self.queue = message_queue
async def create_order(self, order_data):
# Publish event, don't wait for processing
await self.queue.publish('order.created', order_data)
return {'status': 'accepted', 'order_id': order_data['id']}
async def handle_order_created(self, event):
# Process order asynchronously
order = event.data
await self.process_payment(order)
await self.reserve_inventory(order)
await self.notify_customer(order)
API Gateway
class APIGateway:
def __init__(self):
self.routes = {
'/api/users': 'user-service:8001',
'/api/orders': 'order-service:8002',
'/api/products': 'product-service:8003',
}
self.rate_limiter = RateLimiter()
self.auth = AuthService()
async def handle_request(self, request):
# Authentication
if not await self.auth.verify(request.token):
return 401, {'error': 'Unauthorized'}
# Rate limiting
if not self.rate_limiter.allow(request.client_id):
return 429, {'error': 'Rate limit exceeded'}
# Route to service
service = self.routes.get(request.path)
if not service:
return 404, {'error': 'Not found'}
return await self.forward(request, service)
Reliability Patterns
Circuit Breaker
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure_time = None
self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN
def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if time.time() - self.last_failure_time > self.timeout:
self.state = 'HALF_OPEN'
else:
raise CircuitOpenError()
try:
result = func(*args, **kwargs)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise
def on_success(self):
self.failure_count = 0
self.state = 'CLOSED'
def on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = 'OPEN'
Retry with Exponential Backoff
import asyncio
async def retry_with_backoff(func, max_retries=3, base_delay=1):
for attempt in range(max_retries):
try:
return await func()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) # 1, 2, 4 seconds
await asyncio.sleep(delay)
print(f"Retry {attempt + 1} after {delay}s")
Monitoring and Observability
Key Metrics
class MetricsCollector:
def __init__(self):
self.counters = {}
self.gauges = {}
self.histograms = {}
def increment_counter(self, name, tags=None):
key = (name, tuple(sorted((tags or {}).items())))
self.counters[key] = self.counters.get(key, 0) + 1
def set_gauge(self, name, value, tags=None):
key = (name, tuple(sorted((tags or {}).items())))
self.gauges[key] = value
def record_histogram(self, name, value, tags=None):
key = (name, tuple(sorted((tags or {}).items())))
if key not in self.histograms:
self.histograms[key] = []
self.histograms[key].append(value)
Common System Design Questions
URL Shortener
Design a service like bit.ly that converts long URLs to short ones.
Key Components:
- Hash function for unique short codes
- Database mapping short codes to long URLs
- Counter service for sequential generation
- CDN for serving redirect responses
Twitter Timeline
Design a service that generates personalized tweet feeds.
Key Components:
- Fan-out on write vs read
- Cache for popular timelines
- Graph service for follower relationships
- Ranking algorithm for timeline
Distributed Cache
Design a cache system like Redis.
Key Components:
- Data structures (strings, lists, sets, sorted sets)
- Persistence (RDB, AOF)
- Replication
- Clustering
- Eviction policies
Conclusion
System design is a vast topic that combines computer science fundamentals with practical engineering trade-offs. The concepts covered in this guideโscalability, load balancing, caching, databases, microservices, and reliability patternsโform the foundation for designing modern distributed systems.
Remember that system design is about making informed trade-offs based on specific requirements. There’s rarely a single “right” answerโunderstanding the pros and cons of different approaches allows you to choose the best solution for your specific use case.
Resources
- System Design Primer - GitHub
- High Scalability Blog
- Designing Data-Intensive Applications Book
- Grokking the System Design Interview
Comments