System Design Interview: Building a URL Shortener

Introduction

URL shorteners like bit.ly are classic system design interview questions. This guide walks through designing one, covering everything from basic functionality to scaling considerations.

Requirements

Functional Requirements

Shorten long URLs
Redirect to original URL
Custom aliases (optional)
Analytics (optional)

Non-Functional Requirements

High availability
Low latency
Scalability
Durability

High-Level Design

Components

User → Load Balancer → API Server
                             ↓
                        Cache (Redis)
                             ↓
                        Database

API Design

Endpoint	Method	Description
/shorten	POST	Create short URL
/{short_code}	GET	Redirect to long URL
/short/{short_code}	GET	Get info (optional)

Database Design

Schema

CREATE TABLE urls (
    id BIGINT PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    click_count INT DEFAULT 0
);

CREATE INDEX idx_short_code ON urls(short_code);

Storage

Relational: PostgreSQL, MySQL
NoSQL: DynamoDB, Cassandra
Primary key: auto-increment or UUID

Short Code Generation

Methods

1. Hash-Based

import hashlib
import base64

def shorten_url(url):
    hash = hashlib.md5(url.encode()).digest()
    code = base64.urlsafe_encode(hash)[:6]
    return code

Collision possible
Not sequential
Security issue (predictable)

2. Random Generation

import random
import string

def generate_code(length=6):
    chars = string.ascii_letters + string.digits
    return ''.join(random.choice(chars) for _ in range(length))

Check uniqueness
Need retry on collision

3. Counter-Based

Current count: 1,000,000
Base62: "15wd" (fast)

Sequential (not guessable)
Single point of failure
Need distributed counter

Caching

Redis Implementation

def get_long_url(short_code):
    # Check cache first
    long_url = redis.get(f"url:{short_code}")
    if long_url:
        return long_url
    
    # Fetch from database
    long_url = db.query(short_code)
    
    # Cache for future
    redis.setex(f"url:{short_code}", 3600, long_url)
    return long_url

Cache Strategy

LRU eviction
TTL: 1-24 hours
Cache popular URLs

Scaling

Read-Heavy Optimization

Cache heavily
CDN for static assets
Read replicas

Write-Heavy Optimization

Batch inserts
Async processing
Queue-based

Sharding

By short_code:

Hash-based sharding
Consistent hashing

Deep Dive Questions

How would you handle 10x traffic?

Horizontal scaling
More cache
Read replicas
Rate limiting

How to prevent abuse?

Rate limiting per IP
Require auth for custom URLs
Detect spam patterns

Analytics implementation?

Async event logging
Batch processing
Time-series DB

Custom aliases?

User authentication
Rate limits per user
Conflict resolution

Implementation Example

from flask import Flask, redirect, request
import redis
import uuid

app = Flask(__name__)
db = redis.Redis()

@app.route('/shorten', methods=['POST'])
def shorten():
    long_url = request.json['url']
    short_code = str(uuid.uuid4())[:6]
    
    # Store in database
    db.set(f"url:{short_code}", long_url)
    db.set(f"rev:{long_url}", short_code)
    
    return {'short_url': f'https://short.io/{short_code}'}

@app.route('/<short_code>')
def redirect_url(short_code):
    long_url = db.get(f"url:{short_code}")
    if long_url:
        return redirect(long_url)
    return 'Not found', 404

Detailed Database Design

Schema with Indexes

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    user_id BIGINT REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE,
    click_count BIGINT DEFAULT 0,
    last_accessed_at TIMESTAMP WITH TIME ZONE,
    is_active BOOLEAN DEFAULT TRUE
);

CREATE UNIQUE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_user_id ON urls(user_id);
CREATE INDEX idx_created_at ON urls(created_at DESC);
CREATE INDEX idx_expires_at ON urls(expires_at) WHERE expires_at IS NOT NULL;
CREATE INDEX idx_last_accessed ON urls(last_accessed_at) WHERE is_active = TRUE;

-- Analytics table for tracking
CREATE TABLE url_analytics (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) NOT NULL REFERENCES urls(short_code),
    accessed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    ip_address INET,
    user_agent TEXT,
    referer TEXT,
    country_code VARCHAR(2),
    device_type VARCHAR(20)  -- desktop, mobile, tablet
);

CREATE INDEX idx_analytics_code ON url_analytics(short_code, accessed_at);
CREATE INDEX idx_analytics_date ON url_analytics(accessed_at);

Partitioning Strategy

For billions of rows, use time-based partitioning:

-- Partition by month for analytics
CREATE TABLE url_analytics_2026_01 PARTITION OF url_analytics
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE url_analytics_2026_02 PARTITION OF url_analytics
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Storage Estimation

Metric	Daily Volume	Annual Volume
New URLs	1 million	365 million
Redirects	100 million	36.5 billion
Analytics events	100 million	36.5 billion
Storage (URLs)	100 MB	36.5 GB
Storage (analytics)	10 GB	3.65 TB

Short Code Generation in Depth

Base62 Encoding

Use Base62 (0-9, a-z, A-Z) for maximum compression with minimum length:

BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num: int) -> str:
    """Convert a number to Base62 string."""
    if num == 0:
        return BASE62[0]
    
    chars = []
    while num > 0:
        num, remainder = divmod(num, 62)
        chars.append(BASE62[remainder])
    
    return ''.join(reversed(chars))

def decode_base62(code: str) -> int:
    """Convert Base62 string back to number."""
    num = 0
    for char in code:
        num = num * 62 + BASE62.index(char)
    return num

# Generate short code from a distributed ID
short_code = encode_base62(snowflake_id)

Capacity of Different Code Lengths

Length	Combinations	Max URLs	Notes
4	62^4 = 14.7M	14.7 million	Too small for production
5	62^5 = 916M	916 million	Good for most applications
6	62^6 = 56.8B	56.8 billion	Excellent for large scale
7	62^7 = 3.5T	3.5 trillion	Overkill for most use cases

Distributed ID Generation (Snowflake)

Avoid collisions and sequential guessing with a Snowflake-like ID:

import time
import threading

class SnowflakeGenerator:
    """Distributed unique ID generator."""
    
    def __init__(self, worker_id: int, datacenter_id: int = 0):
        self.worker_id = worker_id
        self.datacenter_id = datacenter_id
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()
        
        # Bit layout: 1 sign + 41 timestamp + 5 datacenter + 5 worker + 12 sequence
        self.epoch = 1700000000000  # Custom epoch
    
    def next_id(self) -> int:
        with self.lock:
            timestamp = int(time.time() * 1000)
            
            if timestamp < self.last_timestamp:
                raise Exception("Clock moved backwards")
            
            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & 4095
                if self.sequence == 0:
                    timestamp = self.wait_next_ms()
            else:
                self.sequence = 0
            
            self.last_timestamp = timestamp
            
            return ((timestamp - self.epoch) << 22) | \
                   (self.datacenter_id << 17) | \
                   (self.worker_id << 12) | \
                   self.sequence
    
    def wait_next_ms(self) -> int:
        timestamp = int(time.time() * 1000)
        while timestamp <= self.last_timestamp:
            timestamp = int(time.time() * 1000)
        return timestamp

Advanced Caching Strategy

Multi-Layer Cache

import redis
import aiocache
from typing import Optional

class MultiLayerCache:
    """L1: Local memory, L2: Redis, L3: Database."""

    def __init__(self, redis_client: redis.Redis):
        self.local_cache = {}
        self.redis = redis_client
        self.local_ttl = 60       # 1 minute for L1
        self.redis_ttl = 3600     # 1 hour for L2
        self.max_local_size = 10000

    async def get(self, short_code: str) -> Optional[str]:
        # L1: Local memory (fastest)
        if short_code in self.local_cache:
            entry = self.local_cache[short_code]
            if entry['expires_at'] > time.time():
                return entry['url']
            del self.local_cache[short_code]

        # L2: Redis (fast)
        long_url = await self.redis.get(f"url:{short_code}")
        if long_url:
            # Promote to L1
            if len(self.local_cache) < self.max_local_size:
                self.local_cache[short_code] = {
                    'url': long_url,
                    'expires_at': time.time() + self.local_ttl
                }
            return long_url

        return None

    async def set(self, short_code: str, long_url: str):
        await self.redis.setex(f"url:{short_code}", self.redis_ttl, long_url)
        # Pre-populate L1
        self.local_cache[short_code] = {
            'url': long_url,
            'expires_at': time.time() + self.local_ttl
        }

    async def invalidate(self, short_code: str):
        self.local_cache.pop(short_code, None)
        await self.redis.delete(f"url:{short_code}")

Cache-Aside Pattern with Write-Through

async def redirect_url(short_code: str) -> Optional[str]:
    # Try cache
    long_url = await cache.get(short_code)
    if long_url:
        # Async analytics — don't block the redirect
        asyncio.create_task(track_click(short_code))
        return long_url

    # Cache miss — query database
    row = await db.fetchrow(
        "SELECT long_url FROM urls WHERE short_code = $1 AND is_active = true",
        short_code
    )
    if not row:
        return None

    # Populate cache asynchronously
    asyncio.create_task(cache.set(short_code, row['long_url']))
    asyncio.create_task(track_click(short_code))
    return row['long_url']

Rate Limiting and Abuse Prevention

Token Bucket Implementation

import time
from collections import defaultdict

class TokenBucket:
    """Rate limiter using token bucket algorithm."""

    def __init__(self, rate: int, capacity: int):
        self.rate = rate           # Tokens per second
        self.capacity = capacity   # Max burst
        self.tokens = defaultdict(float)
        self.last_refill = defaultdict(float)

    def allow(self, key: str, tokens: int = 1) -> bool:
        now = time.time()
        # Refill tokens
        elapsed = now - self.last_refill[key]
        self.tokens[key] = min(
            self.capacity,
            self.tokens[key] + elapsed * self.rate
        )
        self.last_refill[key] = now

        if self.tokens[key] >= tokens:
            self.tokens[key] -= tokens
            return True
        return False

# Usage
limiter = TokenBucket(rate=10, capacity=20)  # 10 requests/sec, burst 20

@app.get("/{short_code}")
async def redirect(short_code: str, request: Request):
    client_ip = request.client.host

    if not limiter.allow(f"ip:{client_ip}"):
        raise HTTPException(status_code=429, detail="Too many requests")

    # Process redirect...

Tiered Rate Limits

Tier	Rate Limit	Features
Anonymous	10 req/min	Basic redirect
Free API	100 req/min	Analytics dashboard
Pro API	1000 req/min	Custom domains, advanced analytics
Enterprise	Custom	SLA guarantees, dedicated infrastructure

CAP Theorem and Consistency Considerations

A URL shortener is typically an AP system (Availability + Partition tolerance):

Availability: Redirects must always work — a 503 on a redirect breaks user experience
Partition tolerance: Must handle network partitions gracefully
Consistency trade-off: Eventual consistency for analytics is acceptable. However, short code creation must be strongly consistent to prevent duplicates

Use a strongly consistent database (PostgreSQL, CockroachDB) for URL storage and eventually consistent stores (Redis, Cassandra) for analytics and caching.

Monitoring and Observability

Key Metrics to Track

# Prometheus metrics for URL shortener
METRICS = {
    "redirect_requests_total": "Counter of all redirect requests",
    "redirect_latency_seconds": "Histogram of redirect latency",
    "cache_hit_ratio": "Ratio of cache hits to total lookups",
    "shorten_requests_total": "Counter of URL shortening requests",
    "shorten_latency_seconds": "Histogram of shortening latency",
    "active_urls_count": "Gauge of active shortened URLs",
    "error_rate_by_type": "Counter of errors by type (not_found, rate_limited, etc.)"
}

# Alert thresholds
ALERTS = {
    "high_error_rate": "Error rate > 1% over 5 minutes",
    "high_latency_p99": "p99 latency > 100ms over 5 minutes",
    "low_cache_hit_ratio": "Cache hit ratio < 80% over 10 minutes",
    "high_rate_limiting": "Rate limited requests > 5% over 5 minutes"
}

Distributed Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def handle_redirect(short_code: str):
    with tracer.start_as_current_span("handle_redirect") as span:
        span.set_attribute("short_code", short_code)

        # Cache lookup
        with tracer.start_as_current_span("cache_lookup") as cache_span:
            long_url = await cache.get(short_code)
            cache_span.set_attribute("cache_hit", long_url is not None)

        if not long_url:
            # Database query
            with tracer.start_as_current_span("db_lookup") as db_span:
                long_url = await db.fetch_long_url(short_code)
                db_span.set_attribute("found", long_url is not None)

        if not long_url:
            span.set_attribute("error", "not_found")
            return None

        # Track analytics asynchronously
        with tracer.start_as_current_span("track_analytics") as analytics_span:
            asyncio.create_task(track_visit(short_code))

        span.set_attribute("long_url", long_url)
        return long_url

Interview Tips

Clarify requirements - Ask questions
High-level first - Then deep dive
Think trade-offs - Nothing is perfect
Scale gradually - Start simple
Show you care - Discuss monitoring

Conclusion

URL shortener design tests many skills: API design, database, caching, and scaling. Focus on core functionality first, then address optimizations and edge cases.

Introduction

Requirements

Functional Requirements

Non-Functional Requirements

High-Level Design

Components

API Design

Database Design

Schema

Storage

Short Code Generation

Methods

1. Hash-Based

2. Random Generation

3. Counter-Based

Caching

Redis Implementation

Cache Strategy

Scaling

Read-Heavy Optimization

Write-Heavy Optimization

Sharding

Deep Dive Questions

How would you handle 10x traffic?

How to prevent abuse?

Analytics implementation?

Custom aliases?

Implementation Example

Detailed Database Design

Schema with Indexes

Partitioning Strategy

Storage Estimation

Short Code Generation in Depth

Base62 Encoding

Capacity of Different Code Lengths

Distributed ID Generation (Snowflake)

Advanced Caching Strategy

Multi-Layer Cache

Cache-Aside Pattern with Write-Through

Rate Limiting and Abuse Prevention

Token Bucket Implementation

Tiered Rate Limits

CAP Theorem and Consistency Considerations

Monitoring and Observability

Key Metrics to Track

Distributed Tracing

Interview Tips

Conclusion

Resources

Comments

Share this article

👍 Was this article helpful?