Skip to main content
โšก Calmops

Distributed Systems Fundamentals: Complete Guide

Distributed Systems Fundamentals: Complete Guide

Building distributed systems requires understanding fundamental challenges and patterns. This guide covers core concepts that every architect and engineer should know.

Why Distributed Systems?

Benefits:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  โœ“ Scalability - Add more machines to handle load      โ”‚
โ”‚  โœ“ Reliability - No single point of failure              โ”‚
โ”‚  โœ“ Performance - Parallel processing across machines     โ”‚
โ”‚  โœ“ Cost-effective - Use commodity hardware              โ”‚
โ”‚  โœ“ Availability - Continue running during failures      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The CAP Theorem

Understanding CAP

CAP Theorem states that a distributed system can only guarantee 2 of 3:

    Consistency    Availability    Partition Tolerance
        โ”‚                โ”‚                โ”‚
        โ–ผ                โ–ผ                โ–ผ
   All nodes see   Every request    System works
   same data      gets response    during network
                                    partitions

In reality: Partitions are inevitable, so we choose between:
- CP (Consistency + Partition Tolerance): e.g., ZooKeeper, etcd
- AP (Availability + Partition Tolerance): e.g., Cassandra, DynamoDB

Consistency Models

Strong Consistency:
โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ A:0 โ”‚ Write โ”‚       โ”‚       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      Write to B
โ”‚ All reads see write immediately

Eventual Consistency:
โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ A:0 โ”‚ Write โ”‚       โ”‚       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      Async replication
โ”‚ Reads may see stale data briefly

Read-your-writes Consistency:
โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ A:0 โ”‚ Write โ”‚       โ”‚       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ After write, same node reads see update

Consensus Algorithms

Raft Algorithm

Raft is a consensus algorithm designed to be understandable. It uses:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Raft Log                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Index โ”‚ Term โ”‚ Command                                 โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”‚
โ”‚   0    โ”‚  1   โ”‚ AddServer("A")                        โ”‚
โ”‚   1    โ”‚  1   โ”‚ AddServer("B")                        โ”‚
โ”‚   2    โ”‚  2   โ”‚ AddClient("X")                        โ”‚
โ”‚   3    โ”‚  2   โ”‚ RemoveServer("A")                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Raft States

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚              โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”‚    Leader    โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚     โ”‚              โ”‚      โ”‚
              โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
              โ”‚                          โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
        โ”‚           โ”‚            โ”‚            โ”‚
        โ–ผ           โ–ผ            โ–ผ            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Candidate   โ”‚ โ”‚  Follower    โ”‚ โ”‚  Candidate   โ”‚
โ”‚              โ”‚ โ”‚              โ”‚ โ”‚              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    (Election)     (Normal ops)     (Election)

Leader Election in Raft

# Simplified leader election
class Node:
    def __init__(self, node_id):
        self.node_id = node_id
        self.current_term = 0
        self.voted_for = None
        self.state = "follower"
        self.votes_received = 0
    
    def start_election(self):
        self.current_term += 1
        self.state = "candidate"
        self.voted_for = self.node_id
        self.votes_received = 1
        
        # Request votes from other nodes
        for peer in self.peers:
            if peer.request_vote(self.current_term, self.node_id):
                self.votes_received += 1
        
        # If majority, become leader
        if self.votes_received > len(self.peers) / 2:
            self.become_leader()
    
    def become_leader(self):
        self.state = "leader"
        # Send heartbeats to prevent new elections
        while True:
            self.send_append_entries()
            time.sleep(HEARTBEAT_INTERVAL)

Leader Election Patterns

Basic Leader Election

import etcd3

class LeaderElection:
    def __init__(self, etcd_client, election_name, node_id):
        self.client = etcd_client
        self.name = election_name
        self.node_id = node_id
        self.is_leader = False
    
    def elect(self):
        # Try to create ephemeral node
        try:
            self.client.put(
                f"/elections/{self.name}",
                self.node_id,
                prevExist=False
            )
            self.is_leader = True
            print(f"Node {self.node_id} became leader")
            return True
        except Exception:
            # Check who is leader
            value, _ = self.client.get(f"/elections/{self.name}")
            print(f"Current leader: {value}")
            return False
    
    def watch_for_election(self):
        # Watch for leadership changes
        watch_id = self.client.watch(
            f"/elections/{self.name}",
            self.on_leader_change
        )

Redis-based Leader Election

import redis

class RedisLeaderElection:
    def __init__(self, redis_client, election_name, node_id):
        self.redis = redis_client
        self.key = f"election:{election_name}"
        self.node_id = node_id
    
    def become_candidate(self):
        # Use Redis SET NX with TTL
        result = self.redis.set(
            self.key,
            self.node_id,
            nx=True,
            ex=10  # 10 second lease
        )
        
        if result:
            return True
        
        # Check current leader
        leader = self.redis.get(self.key)
        return leader.decode() if leader else None
    
    def renew_lease(self):
        # Extend our lease periodically
        self.redis.expire(self.key, 10)

Fault Tolerance

Failure Detection

import asyncio

class FailureDetector:
    def __init__(self, nodes, threshold=3):
        self.nodes = {node: 0 for node in nodes}
        self.threshold = threshold
    
    def record_heartbeat(self, node):
        self.nodes[node] = 0
    
    def check_failures(self):
        for node, misses in self.nodes.items():
            self.nodes[node] += 1
            if self.nodes[node] >= self.threshold:
                self.mark_node_failed(node)
    
    def mark_node_failed(self, node):
        print(f"Node {node} is suspected to have failed")
        # Trigger failover or alert

Circuit Breaker

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.success_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.state = CircuitState.CLOSED
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

Distributed Transactions

Two-Phase Commit (2PC)

Phase 1: Prepare
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Coor-  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Node 1 โ”‚     โ”‚ Node 2 โ”‚
โ”‚ dinatorโ”‚โ—€โ”€โ”€โ”€โ”€โ”‚        โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚               
                    โ–ผ               
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  Prepare  โ”‚
              โ”‚  (Lock)   โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phase 2: Commit
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Coor-  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Node 1 โ”‚     โ”‚ Node 2 โ”‚
โ”‚ dinatorโ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚        โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚               
                    โ–ผ               
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚ Commit/   โ”‚
              โ”‚ Rollback  โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Saga Pattern

# Orchestration-based Saga
class OrderSaga:
    def __init__(self):
        self.steps = [
            self.create_order,
            self.reserve_inventory,
            self.process_payment,
            self.ship_order
        ]
    
    async def execute(self, order_data):
        completed_steps = []
        
        try:
            for step in self.steps:
                await step(order_data)
                completed_steps.append(step)
        
        except Exception as e:
            # Compensate in reverse order
            for step in reversed(completed_steps):
                await self.compensate(step, order_data)
            raise
    
    async def create_order(self, data):
        order = await db.orders.create(data)
        data['order_id'] = order.id
    
    async def compensate(self, step, data):
        compensations = {
            self.create_order: lambda d: db.orders.cancel(d['order_id']),
            self.reserve_inventory: lambda d: db.inventory.release(d['sku']),
            self.process_payment: lambda d: db.payments.refund(d['payment_id']),
        }
        await compensations[step](data)

Data Replication

Primary-Replica Replication

Synchronous Replication:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” Write  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  Write  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Client  โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Primary โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Replica โ”‚
โ”‚        โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚        โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ACK    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   ACK   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         Wait for all replicas

Asynchronous Replication:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” Write  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  Write  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Client  โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Primary โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Replica โ”‚
โ”‚        โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚        โ”‚         โ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ACK    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         Immediate return

Conflict Resolution

# Last-Write-Wins
class LWWResolver:
    def resolve(self, values):
        return max(values, key=lambda v: v.timestamp)

# Vector Clocks
class VectorClock:
    def __init__(self):
        self.clock = {}
    
    def increment(self, node):
        self.clock[node] = self.clock.get(node, 0) + 1
    
    def merge(self, other):
        for node, time in other.clock.items():
            self.clock[node] = max(self.clock.get(node, 0), time)
    
    def happens_before(self, other):
        # Check if self happened before other
        return all(
            self.clock.get(n, 0) <= other.clock.get(n, 0)
            for n in set(self.clock.keys()) | set(other.clock.keys())
        )

External Resources


Comments