System Design Fundamentals: Building Scalable Architectures

Introduction

System design is the art and science of architecting software systems that can handle millions of users, process terabytes of data, and remain available 99.99% of the time. Whether you’re preparing for technical interviews at top tech companies or building production systems that serve real users, understanding system design fundamentals is essential for every software engineer.

In 2026, distributed systems have become the norm rather than the exception. Understanding how to design systems that scale horizontally, handle failures gracefully, and maintain consistency across geographic regions has become core knowledge for developers at all levels. This guide provides a comprehensive foundation in system design, covering the key concepts, patterns, and trade-offs that every engineer should understand.

Core Concepts in System Design

Scalability

Scalability refers to a system’s ability to handle increased load by adding resources. There are two primary approaches:

Vertical Scaling (Scale Up): Adding more power to existing machines—more CPU, RAM, or storage. Simpler to implement but has hard limits and creates single points of failure.

Horizontal Scaling (Scale Out): Adding more machines to handle increased load. More complex but virtually unlimited scaling potential. This is the preferred approach for large-scale systems.

# Vertical scaling example - simply adding more resources
class VerticalScale:
    def __init__(self, cpu_cores, ram_gb):
        self.cpu_cores = cpu_cores
        self.ram_gb = ram_gb
    
    def scale_up(self, additional_cores, additional_ram):
        self.cpu_cores += additional_cores
        self.ram_gb += additional_ram

# Horizontal scaling example - adding more instances
class HorizontalScale:
    def __init__(self):
        self.instances = []
    
    def add_instance(self, instance):
        self.instances.append(instance)
    
    def remove_instance(self, instance_id):
        self.instances = [i for i in self.instances if i.id != instance_id]
    
    def distribute_request(self, request):
        # Simple round-robin load balancing
        instance = self.instances[request.id % len(self.instances)]
        return instance.handle(request)

Load Balancing

Load balancers distribute incoming traffic across multiple servers, ensuring no single server becomes overwhelmed. They also provide fault tolerance by detecting failed servers and routing traffic to healthy ones.

Types of Load Balancers:

Layer 4 (Transport): Routes based on IP address and TCP/UDP port
Layer 7 (Application): Routes based on HTTP content, headers, or cookies

class LoadBalancer:
    def __init__(self, servers, algorithm='round_robin'):
        self.servers = servers
        self.algorithm = algorithm
        self.current_index = 0
        self.server_health = {s: True for s in servers}
    
    def get_server(self):
        healthy_servers = [s for s in self.servers if self.server_health[s]]
        
        if self.algorithm == 'round_robin':
            server = healthy_servers[self.current_index % len(healthy_servers)]
            self.current_index += 1
            return server
        
        elif self.algorithm == 'least_connections':
            return min(healthy_servers, key=lambda s: s.active_connections)
        
        elif self.algorithm == 'weighted':
            weights = self.get_server_weights()
            return self.weighted_selection(healthy_servers, weights)
    
    def mark_unhealthy(self, server):
        self.server_health[server] = False
    
    def mark_healthy(self, server):
        self.server_health[server] = True

Caching

Caching stores frequently accessed data in fast storage to reduce latency and database load. Effective caching can improve response times by orders of magnitude.

Caching Strategies:

# Cache-Aside Pattern
class CacheAside:
    def __init__(self, cache, database):
        self.cache = cache
        self.database = database
    
    def get(self, key):
        # Check cache first
        value = self.cache.get(key)
        if value is not None:
            return value
        
        # Cache miss - load from database
        value = self.database.get(key)
        
        # Store in cache for next time
        if value is not None:
            self.cache.set(key, value, ttl=3600)
        
        return value
    
    def set(self, key, value):
        self.database.set(key, value)
        self.cache.set(key, value, ttl=3600)
    
    def delete(self, key):
        self.database.delete(key)
        self.cache.delete(key)

# Write-Through Pattern
class WriteThrough:
    def __init__(self, cache, database):
        self.cache = cache
        self.database = database
    
    def write(self, key, value):
        # Write to both cache and database simultaneously
        self.cache.set(key, value)
        self.database.set(key, value)

# Write-Behind Pattern
class WriteBehind:
    def __init__(self, cache, database, queue):
        self.cache = cache
        self.database = database
        self.queue = queue
    
    def write(self, key, value):
        # Write to cache first, queue for async database write
        self.cache.set(key, value)
        self.queue.enqueue({'key': key, 'value': value})
    
    def flush(self):
        # Periodically write queued items to database
        while not self.queue.is_empty():
            item = self.queue.dequeue()
            self.database.set(item['key'], item['value'])

CAP Theorem

The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously: Consistency, Availability, and Partition tolerance.

Consistency (C): All nodes see the same data at the same time
Availability (A): Every request receives a response (even if stale)
Partition tolerance (P): System continues to operate despite network failures

Since network partitions are inevitable in distributed systems, you must choose between CP (consistency during partitions) and AP (availability during partitions).

# Consistency vs Availability Trade-off Example
class ConsistentSystem:
    """CP: Prioritizes consistency over availability"""
    def get(self, key):
        # Ensure all replicas agree before returning
        results = self.read_from_all_replicas(key)
        if self.all_equal(results):
            return results[0]
        raise InconsistentStateError("Replicas out of sync")
    
    def set(self, key, value):
        # Write to quorum before confirming
        self.write_to_quorum(key, value)

class AvailableSystem:
    """AP: Prioritizes availability over consistency"""
    def get(self, key):
        # Return from any available replica
        for replica in self.replicas:
            if replica.is_available():
                return replica.get(key)
        raise NoAvailableReplicaError()
    
    def set(self, key, value):
        # Write to any available replica
        for replica in self.replicas:
            if replica.is_available():
                replica.set(key, value)

Database Design

SQL vs NoSQL

SQL Databases (PostgreSQL, MySQL):

Structured data with schemas
ACID transactions
Complex queries with JOINs
Vertical scaling
Use when: Data relationships are complex, transactions required

NoSQL Databases (MongoDB, Cassandra, DynamoDB):

Flexible schemas
Horizontal scaling
Eventual consistency
High throughput writes
Use when: Scale is critical, flexible schema needed

# Example: Choosing database based on use case

# E-commerce platform with complex transactions
class EcommerceDatabase:
    def __init__(self):
        # Use SQL for transactional data
        self.order_db = SQLDatabase('postgresql://...')
        self.inventory_db = SQLDatabase('postgresql://...')
        
        # Use NoSQL for flexible product attributes
        self.product_catalog = NoSQLDatabase('mongodb://...')
        
        # Use cache for hot items
        self.hot_products = RedisCache()
    
    def create_order(self, order_data):
        # SQL transaction ensures atomicity
        with self.order_db.transaction():
            order_id = self.order_db.insert('orders', order_data)
            for item in order_data['items']:
                self.inventory_db.update(
                    'inventory',
                    {'quantity': -item['quantity']},
                    {'product_id': item['product_id']}
                )
        return order_id
    
    def get_product(self, product_id):
        # Check cache first
        product = self.hot_products.get(f'product:{product_id}')
        if product:
            return product
        
        # Fallback to NoSQL
        product = self.product_catalog.find_one({'_id': product_id})
        
        # Cache for next request
        if product:
            self.hot_products.set(f'product:{product_id}', product)
        
        return product

Database Replication

class ReplicationManager:
    def __init__(self, primary, replicas):
        self.primary = primary
        self.replicas = replicas
    
    def write(self, data):
        # Write to primary
        self.primary.write(data)
        
        # Async replication to replicas
        for replica in self.replicas:
            asyncio.create_task(replica.replicate(data))
    
    def read(self, key):
        # Read from any replica for load distribution
        replica = random.choice(self.replicas)
        return replica.read(key)

Sharding

Sharding distributes data across multiple databases to enable horizontal scaling.

class ShardedDatabase:
    def __init__(self, shard_count=4):
        self.shards = [Database() for _ in range(shard_count)]
    
    def _get_shard(self, key):
        # Consistent hashing
        hash_value = hash(key)
        return self.shards[hash_value % len(self.shards)]
    
    def write(self, key, value):
        shard = self._get_shard(key)
        shard.write(key, value)
    
    def read(self, key):
        shard = self._get_shard(key)
        return shard.read(key)

Microservices Architecture

Service Communication

# Synchronous communication (REST/gRPC)
class UserService:
    def __init__(self, http_client):
        self.http_client = http_client
    
    def get_user_with_orders(self, user_id):
        # Call user service
        user = self.http_client.get(f'/users/{user_id}')
        
        # Call order service
        orders = self.http_client.get(f'/orders?user_id={user_id}')
        
        return {'user': user, 'orders': orders}

# Asynchronous communication (Message Queue)
class OrderService:
    def __init__(self, message_queue):
        self.queue = message_queue
    
    async def create_order(self, order_data):
        # Publish event, don't wait for processing
        await self.queue.publish('order.created', order_data)
        return {'status': 'accepted', 'order_id': order_data['id']}
    
    async def handle_order_created(self, event):
        # Process order asynchronously
        order = event.data
        await self.process_payment(order)
        await self.reserve_inventory(order)
        await self.notify_customer(order)

API Gateway

class APIGateway:
    def __init__(self):
        self.routes = {
            '/api/users': 'user-service:8001',
            '/api/orders': 'order-service:8002',
            '/api/products': 'product-service:8003',
        }
        self.rate_limiter = RateLimiter()
        self.auth = AuthService()
    
    async def handle_request(self, request):
        # Authentication
        if not await self.auth.verify(request.token):
            return 401, {'error': 'Unauthorized'}
        
        # Rate limiting
        if not self.rate_limiter.allow(request.client_id):
            return 429, {'error': 'Rate limit exceeded'}
        
        # Route to service
        service = self.routes.get(request.path)
        if not service:
            return 404, {'error': 'Not found'}
        
        return await self.forward(request, service)

Reliability Patterns

Circuit Breaker

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN
    
    def call(self, func, *args, **kwargs):
        if self.state == 'OPEN':
            if time.time() - self.last_failure_time > self.timeout:
                self.state = 'HALF_OPEN'
            else:
                raise CircuitOpenError()
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise
    
    def on_success(self):
        self.failure_count = 0
        self.state = 'CLOSED'
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = 'OPEN'

Retry with Exponential Backoff

import asyncio

async def retry_with_backoff(func, max_retries=3, base_delay=1):
    for attempt in range(max_retries):
        try:
            return await func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            
            delay = base_delay * (2 ** attempt)  # 1, 2, 4 seconds
            await asyncio.sleep(delay)
            print(f"Retry {attempt + 1} after {delay}s")

Monitoring and Observability

Key Metrics

class MetricsCollector:
    def __init__(self):
        self.counters = {}
        self.gauges = {}
        self.histograms = {}
    
    def increment_counter(self, name, tags=None):
        key = (name, tuple(sorted((tags or {}).items())))
        self.counters[key] = self.counters.get(key, 0) + 1
    
    def set_gauge(self, name, value, tags=None):
        key = (name, tuple(sorted((tags or {}).items())))
        self.gauges[key] = value
    
    def record_histogram(self, name, value, tags=None):
        key = (name, tuple(sorted((tags or {}).items())))
        if key not in self.histograms:
            self.histograms[key] = []
        self.histograms[key].append(value)

Common System Design Questions

URL Shortener

Design a service like bit.ly that converts long URLs to short ones.

Key Components:

Hash function for unique short codes
Database mapping short codes to long URLs
Counter service for sequential generation
CDN for serving redirect responses

Twitter Timeline

Design a service that generates personalized tweet feeds.

Key Components:

Fan-out on write vs read
Cache for popular timelines
Graph service for follower relationships
Ranking algorithm for timeline

Distributed Cache

Design a cache system like Redis.

Key Components:

Data structures (strings, lists, sets, sorted sets)
Persistence (RDB, AOF)
Replication
Clustering
Eviction policies

Conclusion

System design is a vast topic that combines computer science fundamentals with practical engineering trade-offs. The concepts covered in this guide—scalability, load balancing, caching, databases, microservices, and reliability patterns—form the foundation for designing modern distributed systems.

Remember that system design is about making informed trade-offs based on specific requirements. There’s rarely a single “right” answer—understanding the pros and cons of different approaches allows you to choose the best solution for your specific use case.

System Design Fundamentals: Building Scalable Architectures

Introduction

Core Concepts in System Design

Scalability

Load Balancing

Caching

CAP Theorem

Database Design

SQL vs NoSQL

Database Replication

Sharding

Microservices Architecture

Service Communication

API Gateway

Reliability Patterns

Circuit Breaker

Retry with Exponential Backoff

Monitoring and Observability

Key Metrics

Common System Design Questions

URL Shortener

Twitter Timeline

Distributed Cache

Conclusion

Resources

Comments