Understanding System Design: Scalable Applications

Introduction

System design is the process of defining architecture, components, and data flow for a system. Whether you’re building a startup MVP or enterprise platform, understanding system design principles helps you create scalable, maintainable applications. This guide covers fundamental system design concepts.

Scalability Fundamentals

Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up)

Add more resources to existing machine
Simple to implement
Hardware limits
Single point of failure

Horizontal Scaling (Scale Out)

Add more machines
More complex
Better fault tolerance
Preferred for large systems

Load Balancing

Distributes traffic across servers:

# Simple round-robin
class LoadBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current = 0
    
    def get_server(self):
        server = self.servers[self.current]
        self.current = (self.current + 1) % len(self.servers)
        return server

Load Balancing Algorithms

Round Robin: Sequential distribution
Least Connections: Fewest active requests
IP Hash: Same client to same server
Weighted: Performance-based distribution

Caching

Store frequently accessed data for fast retrieval:

# Cache-aside pattern
def get_user(user_id):
    # Check cache first
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Fetch from database
    user = db.query(User).get(user_id)
    
    # Store in cache
    if user:
        redis.setex(cache_key, 3600, json.dumps(user))
    
    return user

Cache Strategies

Strategy	Description	Use Case
Cache-aside	App manages cache	Read-heavy
Write-through	Write to cache and DB	Read-write
Write-behind	Async DB writes	High write
TTL	Time-based expiration	Any

Database Design

SQL vs NoSQL

SQL	NoSQL
Structured data	Flexible schema
ACID compliance	Eventual consistency
Complex queries	Simple queries
Vertical scaling	Horizontal scaling

When to Use SQL

Transactional data
Complex relationships
Structured data
ACID requirements

When to Use NoSQL

Unstructured data
High write throughput
Horizontal scaling needed
Schema flexibility

Database Patterns

Read Replicas

-- Create read replica
-- Read from replica, write to primary
SELECT * FROM users;  -- Goes to replica
INSERT INTO users...  -- Goes to primary

Sharding

Split data across databases:

Range-based: Users A-M, N-Z
Hash-based: user_id % n
Directory-based: Lookup service

Microservices

Monolith vs Microservices

Monolith	Microservices
Single deployable	Independent services
Simple initially	Complex but scalable
Shared database	Each service owns data
Hard to scale	Scale individually

Communication Patterns

Synchronous (REST/gRPC)

# REST call
def get_user_with_orders(user_id):
    user = user_service.get(user_id)
    orders = order_service.get_by_user(user_id)
    return {"user": user, "orders": orders}

Asynchronous (Message Queue)

# Publish event
def create_order(order_data):
    order = save_order(order_data)
    message_queue.publish("order_created", {
        "order_id": order.id,
        "user_id": order.user_id
    })

Service Discovery

# Kubernetes service
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 80
    targetPort: 8080

Reliability

Fault Tolerance

Circuit breakers
Retries with backoff
Timeouts
Fallbacks

# Circuit breaker
class CircuitBreaker:
    def __init__(self, threshold=5):
        self.failures = 0
        self.threshold = threshold
        self.state = "closed"
    
    def call(self, func):
        if self.state == "open":
            raise Exception("Circuit open")
        
        try:
            result = func()
            self.failures = 0
            return result
        except:
            self.failures += 1
            if self.failures >= self.threshold:
                self.state = "open"
            raise

Redundancy

Multiple availability zones
Data replication
Failover mechanisms

Monitoring

Metrics collection
Logging aggregation
Distributed tracing
Alerting

Common System Designs

URL Shortener

User → API → Database
      ↓
   Cache
      ↓
   Redirect

Components:

Hash function for short codes
Database for mapping
Cache for popular URLs
301/302 redirects

Twitter Feed

User → Load Balancer → API Servers
         ↓
      Cache (user feeds)
         ↓
    Database + Follower graph

Real-time Chat

User → WebSocket → API Server
           ↓
       Redis (presence)
           ↓
      Message Queue
           ↓
    Database + Push notifications

Design Process

1. Requirements Clarification

What features are needed?
What’s the scale?
What’s the timeline?

2. High-Level Design

Components and services
Data flow
APIs

3. Deep Dive

Database schema
Caching strategy
Scaling approach

4. Bottlenecks

Where are potential failures?
How to handle them?

5. Wrap Up

Summary of design
Trade-offs discussed

Key Concepts Summary

Concept	Purpose
Load balancing	Distribute traffic
Caching	Speed up reads
Database sharding	Horizontal scale
Message queues	Async communication
Circuit breakers	Fault tolerance
CDN	Static content delivery

Conclusion

System design skills improve with practice. Start with fundamentals, study real-world systems, and apply concepts in your projects. Understanding these patterns helps you make better architecture decisions.