Distributed Systems Fundamentals: Complete Guide
Building distributed systems requires understanding fundamental challenges and patterns. This guide covers core concepts that every architect and engineer should know.
Why Distributed Systems?
Benefits:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Scalability - Add more machines to handle load โ
โ โ Reliability - No single point of failure โ
โ โ Performance - Parallel processing across machines โ
โ โ Cost-effective - Use commodity hardware โ
โ โ Availability - Continue running during failures โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The CAP Theorem
Understanding CAP
CAP Theorem states that a distributed system can only guarantee 2 of 3:
Consistency Availability Partition Tolerance
โ โ โ
โผ โผ โผ
All nodes see Every request System works
same data gets response during network
partitions
In reality: Partitions are inevitable, so we choose between:
- CP (Consistency + Partition Tolerance): e.g., ZooKeeper, etcd
- AP (Availability + Partition Tolerance): e.g., Cassandra, DynamoDB
Consistency Models
Strong Consistency:
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
โ A:0 โ Write โ โ โ
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
Write to B
โ All reads see write immediately
Eventual Consistency:
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
โ A:0 โ Write โ โ โ
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
Async replication
โ Reads may see stale data briefly
Read-your-writes Consistency:
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
โ A:0 โ Write โ โ โ
โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ
โ After write, same node reads see update
Consensus Algorithms
Raft Algorithm
Raft is a consensus algorithm designed to be understandable. It uses:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Raft Log โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Index โ Term โ Command โ
โ โโโโโโโผโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ 0 โ 1 โ AddServer("A") โ
โ 1 โ 1 โ AddServer("B") โ
โ 2 โ 2 โ AddClient("X") โ
โ 3 โ 2 โ RemoveServer("A") โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Raft States
โโโโโโโโโโโโโโโโ
โ โ
โโโโโโโ Leader โโโโโโโโ
โ โ โ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโดโโโโโโ โโโโโโโโโดโโโโโ
โ โ โ โ
โผ โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Candidate โ โ Follower โ โ Candidate โ
โ โ โ โ โ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
(Election) (Normal ops) (Election)
Leader Election in Raft
# Simplified leader election
class Node:
def __init__(self, node_id):
self.node_id = node_id
self.current_term = 0
self.voted_for = None
self.state = "follower"
self.votes_received = 0
def start_election(self):
self.current_term += 1
self.state = "candidate"
self.voted_for = self.node_id
self.votes_received = 1
# Request votes from other nodes
for peer in self.peers:
if peer.request_vote(self.current_term, self.node_id):
self.votes_received += 1
# If majority, become leader
if self.votes_received > len(self.peers) / 2:
self.become_leader()
def become_leader(self):
self.state = "leader"
# Send heartbeats to prevent new elections
while True:
self.send_append_entries()
time.sleep(HEARTBEAT_INTERVAL)
Leader Election Patterns
Basic Leader Election
import etcd3
class LeaderElection:
def __init__(self, etcd_client, election_name, node_id):
self.client = etcd_client
self.name = election_name
self.node_id = node_id
self.is_leader = False
def elect(self):
# Try to create ephemeral node
try:
self.client.put(
f"/elections/{self.name}",
self.node_id,
prevExist=False
)
self.is_leader = True
print(f"Node {self.node_id} became leader")
return True
except Exception:
# Check who is leader
value, _ = self.client.get(f"/elections/{self.name}")
print(f"Current leader: {value}")
return False
def watch_for_election(self):
# Watch for leadership changes
watch_id = self.client.watch(
f"/elections/{self.name}",
self.on_leader_change
)
Redis-based Leader Election
import redis
class RedisLeaderElection:
def __init__(self, redis_client, election_name, node_id):
self.redis = redis_client
self.key = f"election:{election_name}"
self.node_id = node_id
def become_candidate(self):
# Use Redis SET NX with TTL
result = self.redis.set(
self.key,
self.node_id,
nx=True,
ex=10 # 10 second lease
)
if result:
return True
# Check current leader
leader = self.redis.get(self.key)
return leader.decode() if leader else None
def renew_lease(self):
# Extend our lease periodically
self.redis.expire(self.key, 10)
Fault Tolerance
Failure Detection
import asyncio
class FailureDetector:
def __init__(self, nodes, threshold=3):
self.nodes = {node: 0 for node in nodes}
self.threshold = threshold
def record_heartbeat(self, node):
self.nodes[node] = 0
def check_failures(self):
for node, misses in self.nodes.items():
self.nodes[node] += 1
if self.nodes[node] >= self.threshold:
self.mark_node_failed(node)
def mark_node_failed(self, node):
print(f"Node {node} is suspected to have failed")
# Trigger failover or alert
Circuit Breaker
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_count = 0
self.success_count = 0
self.failure_threshold = failure_threshold
self.timeout = timeout
self.state = CircuitState.CLOSED
self.last_failure_time = None
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise e
def on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
Distributed Transactions
Two-Phase Commit (2PC)
Phase 1: Prepare
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ Coor- โโโโโโถโ Node 1 โ โ Node 2 โ
โ dinatorโโโโโโโ โโโโโโโ โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโ
โ Prepare โ
โ (Lock) โ
โโโโโโโโโโโโโ
Phase 2: Commit
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ Coor- โโโโโโถโ Node 1 โ โ Node 2 โ
โ dinatorโโโโโโถโ โโโโโโถโ โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโ
โ Commit/ โ
โ Rollback โ
โโโโโโโโโโโโโ
Saga Pattern
# Orchestration-based Saga
class OrderSaga:
def __init__(self):
self.steps = [
self.create_order,
self.reserve_inventory,
self.process_payment,
self.ship_order
]
async def execute(self, order_data):
completed_steps = []
try:
for step in self.steps:
await step(order_data)
completed_steps.append(step)
except Exception as e:
# Compensate in reverse order
for step in reversed(completed_steps):
await self.compensate(step, order_data)
raise
async def create_order(self, data):
order = await db.orders.create(data)
data['order_id'] = order.id
async def compensate(self, step, data):
compensations = {
self.create_order: lambda d: db.orders.cancel(d['order_id']),
self.reserve_inventory: lambda d: db.inventory.release(d['sku']),
self.process_payment: lambda d: db.payments.refund(d['payment_id']),
}
await compensations[step](data)
Data Replication
Primary-Replica Replication
Synchronous Replication:
โโโโโโโโโโ Write โโโโโโโโโโ Write โโโโโโโโโโ
โClient โโโโโโโโโถโPrimary โโโโโโโโโโถโReplica โ
โ โโโโโโโโโโ โโโโโโโโโโโ โ
โโโโโโโโโโ ACK โโโโโโโโโโ ACK โโโโโโโโโโ
Wait for all replicas
Asynchronous Replication:
โโโโโโโโโโ Write โโโโโโโโโโ Write โโโโโโโโโโ
โClient โโโโโโโโโถโPrimary โโโโโโโโโโถโReplica โ
โ โโโโโโโโโโ โ โ โ
โโโโโโโโโโ ACK โโโโโโโโโโ โโโโโโโโโโ
Immediate return
Conflict Resolution
# Last-Write-Wins
class LWWResolver:
def resolve(self, values):
return max(values, key=lambda v: v.timestamp)
# Vector Clocks
class VectorClock:
def __init__(self):
self.clock = {}
def increment(self, node):
self.clock[node] = self.clock.get(node, 0) + 1
def merge(self, other):
for node, time in other.clock.items():
self.clock[node] = max(self.clock.get(node, 0), time)
def happens_before(self, other):
# Check if self happened before other
return all(
self.clock.get(n, 0) <= other.clock.get(n, 0)
for n in set(self.clock.keys()) | set(other.clock.keys())
)
External Resources
- Designing Data-Intensive Applications
- Raft Paper
- Introduction to Reliable and Secure Distributed Systems
- Distributed Systems for Practitioners
Comments