Skip to main content

Distributed Systems Fundamentals: Complete Guide

Created: February 26, 2026 Larry Qu 6 min read

Building distributed systems requires understanding fundamental challenges and patterns. This guide covers core concepts that every architect and engineer should know.

Why Distributed Systems?

Benefits:
┌─────────────────────────────────────────────────────────┐
│  ✓ Scalability - Add more machines to handle load      │
│  ✓ Reliability - No single point of failure              │
│  ✓ Performance - Parallel processing across machines     │
│  ✓ Cost-effective - Use commodity hardware              │
│  ✓ Availability - Continue running during failures      │
└─────────────────────────────────────────────────────────┘

The CAP Theorem

Understanding CAP

CAP Theorem states that a distributed system can only guarantee 2 of 3:

    Consistency    Availability    Partition Tolerance
        │                │                │
        ▼                ▼                ▼
   All nodes see   Every request    System works
   same data      gets response    during network
                                    partitions

In reality: Partitions are inevitable, so we choose between:
- CP (Consistency + Partition Tolerance): e.g., ZooKeeper, etcd
- AP (Availability + Partition Tolerance): e.g., Cassandra, DynamoDB

Consistency Models

Strong Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
      Write to B
│ All reads see write immediately

Eventual Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
      Async replication
│ Reads may see stale data briefly

Read-your-writes Consistency:
┌─────┐ ──────┐ ──────┐ ──────┐
│ A:0 │ Write │       │       │
└─────┘ ──────┘ ──────┘ ──────┘
│ After write, same node reads see update

Consensus Algorithms

Raft Algorithm

Raft is a consensus algorithm designed to be understandable. It uses:

┌─────────────────────────────────────────────────────────┐
│                   Raft Log                              │
├─────────────────────────────────────────────────────────┤
│  Index │ Term │ Command                                 │
│  ──────┼──────┼────────────────────────────────────    │
│   0    │  1   │ AddServer("A")                        │
│   1    │  1   │ AddServer("B")                        │
│   2    │  2   │ AddClient("X")                        │
│   3    │  2   │ RemoveServer("A")                     │
└─────────────────────────────────────────────────────────┘

Raft States

                    ┌──────────────┐
                    │              │
              ┌─────│    Leader    │◀─────┐
              │     │              │      │
              │     └──────────────┘      │
              │                          │
        ┌─────┴─────┐            ┌───────┴────┐
        │           │            │            │
        ▼           ▼            ▼            ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Candidate   │ │  Follower    │ │  Candidate   │
│              │ │              │ │              │
└──────────────┘ └──────────────┘ └──────────────┘
    (Election)     (Normal ops)     (Election)

Leader Election in Raft

## Simplified leader election
class Node:
    def __init__(self, node_id):
        self.node_id = node_id
        self.current_term = 0
        self.voted_for = None
        self.state = "follower"
        self.votes_received = 0
    
    def start_election(self):
        self.current_term += 1
        self.state = "candidate"
        self.voted_for = self.node_id
        self.votes_received = 1
        
        # Request votes from other nodes
        for peer in self.peers:
            if peer.request_vote(self.current_term, self.node_id):
                self.votes_received += 1
        
        # If majority, become leader
        if self.votes_received > len(self.peers) / 2:
            self.become_leader()
    
    def become_leader(self):
        self.state = "leader"
        # Send heartbeats to prevent new elections
        while True:
            self.send_append_entries()
            time.sleep(HEARTBEAT_INTERVAL)

Leader Election Patterns

Basic Leader Election

import etcd3

class LeaderElection:
    def __init__(self, etcd_client, election_name, node_id):
        self.client = etcd_client
        self.name = election_name
        self.node_id = node_id
        self.is_leader = False
    
    def elect(self):
        # Try to create ephemeral node
        try:
            self.client.put(
                f"/elections/{self.name}",
                self.node_id,
                prevExist=False
            )
            self.is_leader = True
            print(f"Node {self.node_id} became leader")
            return True
        except Exception:
            # Check who is leader
            value, _ = self.client.get(f"/elections/{self.name}")
            print(f"Current leader: {value}")
            return False
    
    def watch_for_election(self):
        # Watch for leadership changes
        watch_id = self.client.watch(
            f"/elections/{self.name}",
            self.on_leader_change
        )

Redis-based Leader Election

import redis

class RedisLeaderElection:
    def __init__(self, redis_client, election_name, node_id):
        self.redis = redis_client
        self.key = f"election:{election_name}"
        self.node_id = node_id
    
    def become_candidate(self):
        # Use Redis SET NX with TTL
        result = self.redis.set(
            self.key,
            self.node_id,
            nx=True,
            ex=10  # 10 second lease
        )
        
        if result:
            return True
        
        # Check current leader
        leader = self.redis.get(self.key)
        return leader.decode() if leader else None
    
    def renew_lease(self):
        # Extend our lease periodically
        self.redis.expire(self.key, 10)

Fault Tolerance

Failure Detection

import asyncio

class FailureDetector:
    def __init__(self, nodes, threshold=3):
        self.nodes = {node: 0 for node in nodes}
        self.threshold = threshold
    
    def record_heartbeat(self, node):
        self.nodes[node] = 0
    
    def check_failures(self):
        for node, misses in self.nodes.items():
            self.nodes[node] += 1
            if self.nodes[node] >= self.threshold:
                self.mark_node_failed(node)
    
    def mark_node_failed(self, node):
        print(f"Node {node} is suspected to have failed")
        # Trigger failover or alert

Circuit Breaker

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.success_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.state = CircuitState.CLOSED
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

Distributed Transactions

Two-Phase Commit (2PC)

Phase 1: Prepare
┌────────┐     ┌────────┐     ┌────────┐
│ Coor-  │────▶│ Node 1 │     │ Node 2 │
│ dinator│◀────│        │◀────│        │
└────────┘     └────────┘     └────────┘
              ┌───────────┐
              │  Prepare  │
              │  (Lock)   │
              └───────────┘

Phase 2: Commit
┌────────┐     ┌────────┐     ┌────────┐
│ Coor-  │────▶│ Node 1 │     │ Node 2 │
│ dinator│────▶│        │────▶│        │
└────────┘     └────────┘     └────────┘
              ┌───────────┐
              │ Commit/   │
              │ Rollback  │
              └───────────┘

Saga Pattern

## Orchestration-based Saga
class OrderSaga:
    def __init__(self):
        self.steps = [
            self.create_order,
            self.reserve_inventory,
            self.process_payment,
            self.ship_order
        ]
    
    async def execute(self, order_data):
        completed_steps = []
        
        try:
            for step in self.steps:
                await step(order_data)
                completed_steps.append(step)
        
        except Exception as e:
            # Compensate in reverse order
            for step in reversed(completed_steps):
                await self.compensate(step, order_data)
            raise
    
    async def create_order(self, data):
        order = await db.orders.create(data)
        data['order_id'] = order.id
    
    async def compensate(self, step, data):
        compensations = {
            self.create_order: lambda d: db.orders.cancel(d['order_id']),
            self.reserve_inventory: lambda d: db.inventory.release(d['sku']),
            self.process_payment: lambda d: db.payments.refund(d['payment_id']),
        }
        await compensations[step](/architecture/data)

Data Replication

Primary-Replica Replication

Synchronous Replication:
┌────────┐ Write  ┌────────┐  Write  ┌────────┐
│Client  │───────▶│Primary │────────▶│Replica │
│        │◀───────│        │◀────────│        │
└────────┘ ACK    └────────┘   ACK   └────────┘
         Wait for all replicas

Asynchronous Replication:
┌────────┐ Write  ┌────────┐  Write  ┌────────┐
│Client  │───────▶│Primary │────────▶│Replica │
│        │◀───────│        │         │        │
└────────┘ ACK    └────────┘         └────────┘
         Immediate return

Conflict Resolution

## Last-Write-Wins
class LWWResolver:
    def resolve(self, values):
        return max(values, key=lambda v: v.timestamp)

## Vector Clocks
class VectorClock:
    def __init__(self):
        self.clock = {}
    
    def increment(self, node):
        self.clock[node] = self.clock.get(node, 0) + 1
    
    def merge(self, other):
        for node, time in other.clock.items():
            self.clock[node] = max(self.clock.get(node, 0), time)
    
    def happens_before(self, other):
        # Check if self happened before other
        return all(
            self.clock.get(n, 0) <= other.clock.get(n, 0)
            for n in set(self.clock.keys()) | set(other.clock.keys())
        )

Conclusion

Distributed systems are built on foundational tradeoffs: consistency vs. availability (CAP), latency vs. throughput, and partition tolerance vs. coordination overhead. Understanding these tradeoffs is more important than memorizing specific algorithms. Choose the right consistency model for each data path—strong consistency where correctness matters, eventual consistency where availability and speed are paramount.

External Resources


Comments

Share this article

Scan to read on mobile

👍 Was this article helpful?