Distributed Systems Fundamentals: Complete Guide

Building distributed systems requires understanding fundamental challenges and patterns. This guide covers core concepts that every architect and engineer should know.

Why Distributed Systems?

Benefits:
┌─────────────────────────────────────────────────────────┐
│  ✓ Scalability - Add more machines to handle load      │
│  ✓ Reliability - No single point of failure              │
│  ✓ Performance - Parallel processing across machines     │
│  ✓ Cost-effective - Use commodity hardware              │
│  ✓ Availability - Continue running during failures      │
└─────────────────────────────────────────────────────────┘

The CAP Theorem

Understanding CAP

CAP Theorem states that a distributed system can only guarantee 2 of 3:

    Consistency    Availability    Partition Tolerance
        │                │                │
        ▼                ▼                ▼
   All nodes see   Every request    System works
   same data      gets response    during network
                                    partitions

In reality: Partitions are inevitable, so we choose between:
- CP (Consistency + Partition Tolerance): e.g., ZooKeeper, etcd
- AP (Availability + Partition Tolerance): e.g., Cassandra, DynamoDB

Consistency Models

Strong Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
      Write to B
│ All reads see write immediately

Eventual Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
      Async replication
│ Reads may see stale data briefly

Read-your-writes Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
│ After write, same node reads see update

Consensus Algorithms

Raft Algorithm

Raft is a consensus algorithm designed to be understandable. It uses:

┌─────────────────────────────────────────────────────────┐
│                   Raft Log                              │
├─────────────────────────────────────────────────────────┤
│  Index │ Term │ Command                                 │
│  ──────┼──────┼────────────────────────────────────    │
│   0    │  1   │ AddServer("A")                        │
│   1    │  1   │ AddServer("B")                        │
│   2    │  2   │ AddClient("X")                        │
│   3    │  2   │ RemoveServer("A")                     │
└─────────────────────────────────────────────────────────┘

Raft States

                    ┌──────────────┐
                    │              │
              ┌─────│    Leader    │◀─────┐
              │     │              │      │
              │     └──────────────┘      │
              │                          │
        ┌─────┴─────┐            ┌───────┴────┐
        │           │            │            │
        ▼           ▼            ▼            ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Candidate   │ │  Follower    │ │  Candidate   │
│              │ │              │ │              │
└──────────────┘ └──────────────┘ └──────────────┘
    (Election)     (Normal ops)     (Election)

Leader Election in Raft

# Simplified leader election
class Node:
    def __init__(self, node_id):
        self.node_id = node_id
        self.current_term = 0
        self.voted_for = None
        self.state = "follower"
        self.votes_received = 0
    
    def start_election(self):
        self.current_term += 1
        self.state = "candidate"
        self.voted_for = self.node_id
        self.votes_received = 1
        
        # Request votes from other nodes
        for peer in self.peers:
            if peer.request_vote(self.current_term, self.node_id):
                self.votes_received += 1
        
        # If majority, become leader
        if self.votes_received > len(self.peers) / 2:
            self.become_leader()
    
    def become_leader(self):
        self.state = "leader"
        # Send heartbeats to prevent new elections
        while True:
            self.send_append_entries()
            time.sleep(HEARTBEAT_INTERVAL)

Leader Election Patterns

Basic Leader Election

import etcd3

class LeaderElection:
    def __init__(self, etcd_client, election_name, node_id):
        self.client = etcd_client
        self.name = election_name
        self.node_id = node_id
        self.is_leader = False
    
    def elect(self):
        # Try to create ephemeral node
        try:
            self.client.put(
                f"/elections/{self.name}",
                self.node_id,
                prevExist=False
            )
            self.is_leader = True
            print(f"Node {self.node_id} became leader")
            return True
        except Exception:
            # Check who is leader
            value, _ = self.client.get(f"/elections/{self.name}")
            print(f"Current leader: {value}")
            return False
    
    def watch_for_election(self):
        # Watch for leadership changes
        watch_id = self.client.watch(
            f"/elections/{self.name}",
            self.on_leader_change
        )

Redis-based Leader Election

import redis

class RedisLeaderElection:
    def __init__(self, redis_client, election_name, node_id):
        self.redis = redis_client
        self.key = f"election:{election_name}"
        self.node_id = node_id
    
    def become_candidate(self):
        # Use Redis SET NX with TTL
        result = self.redis.set(
            self.key,
            self.node_id,
            nx=True,
            ex=10  # 10 second lease
        )
        
        if result:
            return True
        
        # Check current leader
        leader = self.redis.get(self.key)
        return leader.decode() if leader else None
    
    def renew_lease(self):
        # Extend our lease periodically
        self.redis.expire(self.key, 10)

Fault Tolerance

Failure Detection

import asyncio

class FailureDetector:
    def __init__(self, nodes, threshold=3):
        self.nodes = {node: 0 for node in nodes}
        self.threshold = threshold
    
    def record_heartbeat(self, node):
        self.nodes[node] = 0
    
    def check_failures(self):
        for node, misses in self.nodes.items():
            self.nodes[node] += 1
            if self.nodes[node] >= self.threshold:
                self.mark_node_failed(node)
    
    def mark_node_failed(self, node):
        print(f"Node {node} is suspected to have failed")
        # Trigger failover or alert

Circuit Breaker

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.success_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.state = CircuitState.CLOSED
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

Distributed Transactions

Two-Phase Commit (2PC)

Phase 1: Prepare
┌────────┐     ┌────────┐     ┌────────┐
│ Coor-  │────▶│ Node 1 │     │ Node 2 │
│ dinator│◀────│        │◀────│        │
└────────┘     └────────┘     └────────┘
                    │               
                    ▼               
              ┌───────────┐
              │  Prepare  │
              │  (Lock)   │
              └───────────┘

Phase 2: Commit
┌────────┐     ┌────────┐     ┌────────┐
│ Coor-  │────▶│ Node 1 │     │ Node 2 │
│ dinator│────▶│        │────▶│        │
└────────┘     └────────┘     └────────┘
                    │               
                    ▼               
              ┌───────────┐
              │ Commit/   │
              │ Rollback  │
              └───────────┘

Saga Pattern

# Orchestration-based Saga
class OrderSaga:
    def __init__(self):
        self.steps = [
            self.create_order,
            self.reserve_inventory,
            self.process_payment,
            self.ship_order
        ]
    
    async def execute(self, order_data):
        completed_steps = []
        
        try:
            for step in self.steps:
                await step(order_data)
                completed_steps.append(step)
        
        except Exception as e:
            # Compensate in reverse order
            for step in reversed(completed_steps):
                await self.compensate(step, order_data)
            raise
    
    async def create_order(self, data):
        order = await db.orders.create(data)
        data['order_id'] = order.id
    
    async def compensate(self, step, data):
        compensations = {
            self.create_order: lambda d: db.orders.cancel(d['order_id']),
            self.reserve_inventory: lambda d: db.inventory.release(d['sku']),
            self.process_payment: lambda d: db.payments.refund(d['payment_id']),
        }
        await compensations[step](data)

Data Replication

Primary-Replica Replication

Synchronous Replication:
┌────────┐ Write  ┌────────┐  Write  ┌────────┐
│Client  │───────▶│Primary │────────▶│Replica │
│        │◀───────│        │◀────────│        │
└────────┘ ACK    └────────┘   ACK   └────────┘
         Wait for all replicas

Asynchronous Replication:
┌────────┐ Write  ┌────────┐  Write  ┌────────┐
│Client  │───────▶│Primary │────────▶│Replica │
│        │◀───────│        │         │        │
└────────┘ ACK    └────────┘         └────────┘
         Immediate return

Conflict Resolution

# Last-Write-Wins
class LWWResolver:
    def resolve(self, values):
        return max(values, key=lambda v: v.timestamp)

# Vector Clocks
class VectorClock:
    def __init__(self):
        self.clock = {}
    
    def increment(self, node):
        self.clock[node] = self.clock.get(node, 0) + 1
    
    def merge(self, other):
        for node, time in other.clock.items():
            self.clock[node] = max(self.clock.get(node, 0), time)
    
    def happens_before(self, other):
        # Check if self happened before other
        return all(
            self.clock.get(n, 0) <= other.clock.get(n, 0)
            for n in set(self.clock.keys()) | set(other.clock.keys())
        )

Distributed Systems Fundamentals: Complete Guide

Distributed Systems Fundamentals: Complete Guide

Why Distributed Systems?

The CAP Theorem

Understanding CAP

Consistency Models

Consensus Algorithms

Raft Algorithm

Raft States

Leader Election in Raft

Leader Election Patterns

Basic Leader Election

Redis-based Leader Election

Fault Tolerance

Failure Detection

Circuit Breaker

Distributed Transactions

Two-Phase Commit (2PC)

Saga Pattern

Data Replication

Primary-Replica Replication

Conflict Resolution

External Resources

Comments