Skip to main content

Caching Strategies 2026: Redis, Valkey, CDN, and Application Caching

Published: February 27, 2026 Updated: May 24, 2026 Larry Qu 18 min read

Caching is the most impactful performance optimization available. A well-designed caching strategy can reduce latency by 10-100x and reduce load on origin systems by 90% or more. This guide covers caching at every layer - from CDN to application to database.

The Caching Pyramid

┌─────────────────────────────────────┐
│        CDN / Edge Cache             │  ← Milliseconds, global
├─────────────────────────────────────┤
│        Application Cache            │  ← Milliseconds
├─────────────────────────────────────┤
│        Database Query Cache         │  ← Milliseconds
├─────────────────────────────────────┤
│        Primary Database             │  ← Milliseconds to seconds
└─────────────────────────────────────┘

Each layer caches different data with different characteristics.

The Cache Ecosystem in 2026

The caching landscape has shifted significantly since 2024. The most consequential change is the Redis license fork that created Valkey — a Linux Foundation–governed, BSD-3-Clause fork of Redis OSS. Two years of independent development have produced measurable differences.

Valkey 8.1 Redis 8.2
License BSD-3-Clause Dual BSLLv1 + AGPLv3
Throughput (1KB SET, r6g.large) 1.20M ops/sec 1.11M ops/sec
P99 latency (mixed workload) 1.8 ms 2.3 ms
Memory overhead vs Redis 7.2 -20% Baseline
I/O threading Multi-threaded + TLS offload Single-threaded main
Vector search Module (Valkey-Search) Native Vector Sets (HNSW)
Default in Linux distros Fedora 42+, Ubuntu 26.04+, Debian 13 None
AWS ElastiCache cost (t4g.medium) $0.052/hr $0.065/hr

Redis 8.x has shipped aggressive feature releases — Vector Sets with HNSW indexing, hash field TTL, JSON Path queries, Functions 2.0 with WebAssembly — but the AGPL license continues to constrain commercial adoption. Snap Inc. cut caching infrastructure costs 60% by migrating 70% of clusters to ElastiCache Valkey, dropping from $2.1M/year to $840K/year while serving 5 billion daily requests.

Dragonfly has also emerged as a modern, multi-threaded, Redis-compatible alternative claiming up to 25x throughput over Redis on multi-core hardware. It runs on a single thread per core architecture with no replication lag.

For greenfield projects in 2026, Valkey is the pragmatic default for caching, session stores, and pub/sub. Stick with Redis 8 if you need native Vector Sets, JSON Path, or active-active CRDB geo-replication (Redis Enterprise). Both engines share the RESP3 wire protocol, so clients and tooling are interchangeable.

In-Memory Data Stores

Redis Data Structures

import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

# Strings - Simple values
r.set('user:123:name', 'John Doe')
r.set('user:123:email', '[email protected]')
r.setex('session:abc123', 3600, json.dumps({'user_id': 123}))

# Hashes - Objects
r.hset('user:123', mapping={
    'name': 'John Doe',
    'email': '[email protected]',
    'created_at': '2024-01-15'
})
user = r.hgetall('user:123')

# Lists - Ordered collections
r.lpush('recent_searches:user123', 'shoes')
r.ltrim('recent_searches:user123', 0, 9)  # Keep only 10

# Sets - Unique collections
r.sadd('product:views:2024-01-15', 'product1', 'product2', 'product3')
unique_viewers = r.scard('product:views:2024-01-15')

# Sorted Sets - Leaderboards
r.zadd('leaderboard:score', {'player1': 100, 'player2': 95, 'player3': 90})
top_players = r.zrevrange('leaderboard:score', 0, 9, withscores=True)

Caching Patterns

Cache-Aside (Lazy Loading):

# Standard cache-aside pattern
def get_user(user_id):
    # Try cache first
    cached = redis.get(f'user:{user_id}')
    if cached:
        return json.loads(cached)
    
    # Cache miss - fetch from database
    user = db.query('SELECT * FROM users WHERE id = ?', user_id)
    
    if user:
        # Store in cache with TTL
        redis.setex(f'user:{user_id}', 3600, json.dumps(user))
    
    return user

def update_user(user_id, data):
    # Update database first
    db.execute('UPDATE users SET ... WHERE id = ?', user_id, data)
    
    # Invalidate cache
    redis.delete(f'user:{user_id}')

Write-Through:

# Write-through - cache updated on every write
def create_order(order_data):
    # Write to database
    order_id = db.insert('orders', order_data)
    
    # Write to cache immediately
    cache_key = f'order:{order_id}'
    redis.setex(cache_key, 3600, json.dumps({**order_data, 'id': order_id}))
    
    return order_id

def update_order(order_id, data):
    db.update('orders', data, 'id = ?', order_id)
    
    # Update cache
    cache_key = f'order:{order_id}'
    cached = redis.get(cache_key)
    if cached:
        order = json.loads(cached)
        order.update(data)
        redis.setex(cache_key, 3600, json.dumps(order))

Write-Behind:

# Write-behind - async database writes
import asyncio
from collections import deque

write_queue = deque()

async def update_user_async(user_id, data):
    # Update cache immediately
    redis.setex(f'user:{user_id}', 3600, json.dumps(data))
    
    # Queue for async database write
    write_queue.append({
        'table': 'users',
        'id': user_id,
        'data': data
    })

async def flush_writes():
    while True:
        if write_queue:
            batch = []
            while write_queue and len(batch) < 100:
                batch.append(write_queue.popleft())
            
            # Bulk write to database
            db.batch_update('users', batch)
        
        await asyncio.sleep(1)

Redis Cluster Patterns

# Redis Cluster connection
from redis.cluster import RedisCluster

nodes = [
    {'host': 'redis-1', 'port': 6379},
    {'host': 'redis-2', 'port': 6379},
    {'host': 'redis-3', 'port': 6379},
]

rc = RedisCluster(startup_nodes=nodes, decode_responses=True)

# Automatic key distribution
rc.set('user:1:name', 'Alice')
rc.set('user:2:name', 'Bob')
# Keys automatically sharded across nodes

Redis Pub/Sub for Cache Invalidation

# Publisher - notify other services of changes
def invalidate_user_cache(user_id):
    # Local cache delete
    redis.delete(f'user:{user_id}')
    
    # Notify other instances
    redis.publish('cache:invalidate', json.dumps({
        'key': f'user:{user_id}',
        'pattern': f'user:{user_id}:*'
    }))
# Subscriber - listen for invalidations
def listen_for_invalidation():
    pubsub = redis.pubsub()
    pubsub.subscribe('cache:invalidate')
    
    for message in pubsub.listen():
        if message['type'] == 'message':
            data = json.loads(message['data'])
            if '*' in data['pattern']:
                # Handle pattern
                for key in redis.keys(data['pattern']):
                    redis.delete(key)
            else:
                redis.delete(data['key'])

Valkey Configuration for Production

Valkey configuration is nearly identical to Redis, with a few additions for I/O threading:

# /etc/valkey/valkey.conf
io-threads 4
io-threads-do-reads yes
maxmemory 4gb
maxmemory-policy allkeys-lfu
save 900 1
save 300 10
appendonly yes
appendfsync everysec
tls-port 6379
tls-cert-file /etc/valkey/valkey.crt
tls-key-file /etc/valkey/valkey.key

Verify the installation:

$ valkey-cli INFO server | grep valkey_version
valkey_version:8.1.7

Migration from Redis 7.2 to Valkey

Migrating requires zero application code changes because Valkey speaks the identical RESP2 and RESP3 protocols:

# On AWS ElastiCache — one-click upgrade
aws elasticache modify-cache-cluster \
  --cache-cluster-id my-redis-cluster \
  --engine valkey \
  --engine-version 8.1

# Self-hosted — export RDB and import
redis-cli --rdb dump.rdb
valkey-server --rdb-compression yes < dump.rdb

The primary operational risks are Redis Inc. proprietary modules (RediSearch, RedisJSON, RedisGraph, RedisTimeSeries, RedisBloom). Valkey ships community-module equivalents that lag by one to two releases.

Application-Level Caching

In-Memory Caching

from functools import lru_cache
from threading import Lock
import time

# Simple in-memory cache with TTL
class Cache:
    def __init__(self, ttl=300):
        self._cache = {}
        self._timestamps = {}
        self._ttl = ttl
        self._lock = Lock()
    
    def get(self, key):
        with self._lock:
            if key in self._cache:
                if time.time() - self._timestamps[key] < self._ttl:
                    return self._cache[key]
                del self._cache[key]
                del self._timestamps[key]
        return None
    
    def set(self, key, value):
        with self._lock:
            self._cache[key] = value
            self._timestamps[key] = time.time()
    
    def delete(self, key):
        with self._lock:
            self._cache.pop(key, None)
            self._timestamps.pop(key, None)
    
    def clear(self):
        with self._lock:
            self._cache.clear()
            self._timestamps.clear()

# Usage with LRU cache
@lru_cache(maxsize=1000)
def get_product_category(product_id):
    # Expensive database query
    return db.query('SELECT category FROM products WHERE id = ?', product_id)

# Manual cache management
from cachetools import TTLCache, cached

product_cache = TTLCache(maxsize=10000, ttl=300)

@cached(cache=product_cache)
def get_product(product_id):
    return db.query('SELECT * FROM products WHERE id = ?', product_id)

Distributed Caching Patterns

# Multi-layer cache with local + Redis
class MultiLayerCache:
    def __init__(self, local_ttl=60, remote_ttl=3600):
        self.local = {}  # Local dict
        self.redis = redis.Redis()
        self.local_ttl = local_ttl
        self.remote_ttl = remote_ttl
    
    def get(self, key):
        # Check local cache first
        if key in self.local:
            value, timestamp = self.local[key]
            if time.time() - timestamp < self.local_ttl:
                return value
            del self.local[key]
        
        # Check Redis
        value = self.redis.get(key)
        if value:
            # Populate local cache
            self.local[key] = (value, time.time())
            return json.loads(value)
        
        return None
    
    def set(self, key, value):
        # Write to both layers
        self.redis.setex(key, self.remote_ttl, json.dumps(value))
        self.local[key] = (value, time.time())

CDN and Edge Caching

HTTP Cache Headers — The Foundation

Cache-Control is the primary mechanism for controlling how CDNs, proxies, and browsers cache your content:

Directive Meaning
public Both CDN and browser can cache
private Browser only (user-specific content)
no-cache Cacheable but must revalidate with origin before each use
no-store Never cached anywhere (sensitive data)
max-age=N Browser/private cache lifetime in seconds
s-maxage=N CDN/shared cache lifetime (overrides max-age)
must-revalidate After expiry, origin must confirm freshness
immutable Content never changes (CSS/JS with content hash in filename)
stale-while-revalidate=N Serve stale content for N seconds while refreshing in background
stale-if-error=N On origin error, serve stale content for up to N seconds

Apply these patterns based on content type:

# Static assets with content hash in filename — cache forever
def get_hashed_asset(filename):
    response = send_file(f'static/{filename}')
    response.headers['Cache-Control'] = 'public, max-age=31536000, immutable'
    return response

# HTML pages — CDN caches briefly, stale-while-revalidate for resilience
def get_page(slug):
    response = render_page(slug)
    response.headers['Cache-Control'] = 'public, max-age=0, s-maxage=300, stale-while-revalidate=86400'
    return response

# API responses — short TTL with background refresh
def get_api_data(request):
    if request.user:
        response = jsonify(fetch_user_data(request.user))
        response.headers['Cache-Control'] = 'private, max-age=60, must-revalidate'
    else:
        response = jsonify(fetch_public_data())
        response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300, stale-while-revalidate=86400'
    return response

# Sensitive data
def get_user_dashboard(request):
    response = jsonify(request.user.dashboard)
    response.headers['Cache-Control'] = 'private, no-store'
    return response

Important: no-cache does NOT mean “don’t cache”. It means the content can be cached but the cache must revalidate with the origin (via If-None-Match/ETag) before serving. For true “never cache”, use no-store.

ETag and Last-Modified — Conditional Requests

ETags let the server validate cached content efficiently without transferring the full response:

from hashlib import md5

def get_user_profile(user_id):
    profile = db.query('SELECT * FROM users WHERE id = ?', user_id)
    profile_json = json.dumps(profile)
    etag = md5(profile_json.encode()).hexdigest()
    
    # Client sends If-None-Match header
    if request.headers.get('If-None-Match') == etag:
        return Response(status=304)  # Not Modified — no body
    
    response = jsonify(profile)
    response.headers['ETag'] = etag
    response.headers['Cache-Control'] = 'public, max-age=3600'
    return response

When the content hasn’t changed, the server returns a 304 Not Modified with an empty body — saving bandwidth while keeping the cached response valid.

Vary Header — Cache Key Differentiation

response.headers['Vary'] = 'Accept-Encoding, Accept-Language'

This tells the CDN to cache separate entries for different Accept-Encoding (gzip vs br) and Accept-Language (en vs ko) values. Never use Vary: User-Agent — it creates virtually infinite cache variants and destroys the hit ratio.

CDN Cache Invalidation — Push and Tag-Based

import boto3
import time

# CloudFront invalidation
def invalidate_cdn(paths):
    cloudfront = boto3.client('cloudfront')
    response = cloudfront.create_invalidation(
        DistributionId='E1234567890ABC',
        InvalidationBatch={
            'CallerReference': f'invalidation-{int(time.time())}',
            'Paths': {
                'Quantity': len(paths),
                'Items': paths
            }
        }
    )
    return response['Invalidation']['Id']

invalidate_cdn(['/api/products/*', '/static/images/*'])
// Cloudflare cache purge by URL or tag
async function purgeCache(zoneId, apiToken, urls) {
  const response = await fetch(
    `https://api.cloudflare.com/client/v4/zones/${zoneId}/purge_cache`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiToken}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        files: urls,
        tags: ['product-update'],
        prefixes: ['/api/']
      })
    }
  );
  return response.json();
}

Cache Tags (supported by Fastly, Cloudflare Enterprise, and Varnish) enable elegant bulk invalidation:

HTTP/1.1 200 OK
Cache-Tag: user-123, post-456, blog-list

When a user updates their profile, invalidate every page tagged with their user ID:

curl -X POST https://api.fastly.com/service/{id}/purge/user-123

CDN Provider Comparison

Cloudflare Fastly CloudFront Akamai
PoPs 300+ 80+ 600+ 4,000+
Edge compute Workers (V8) Compute@Edge (Wasm) Lambda@Edge EdgeWorkers
Cache invalidation Instant 150ms Minutes Minutes
Cache tags Enterprise Yes No Yes
Free tier Generous No No No
Price Very cheap Expensive Moderate Expensive
DDoS protection Excellent Good Good Excellent

Cloudflare is the price/performance champion for most use cases. Fastly excels at instant cache purges for media and news. CloudFront is the default for deep AWS integrations.

Stale-While-Revalidate — Perceived 100% Hit Ratio

This is the most impactful caching pattern to emerge in recent years:

Traditional caching:
  [cache expires] → [origin request] → [wait] → [response]
                   ↑ user waits

Stale-While-Revalidate:
  [cache expires] → [serve stale instantly] → [background refresh]
                   ↑ user gets instant response

The sequence of events:

User 1 at 60s: [cache hit, fresh] — instant
User 2 at 61s: [cache hit, stale] — instant, background refresh starts
User 3 at 62s: [cache hit, fresh] — instant (just refreshed from bg)

Every user gets an instant response. Next.js Incremental Static Regeneration (ISR) uses this pattern internally. The stale-if-error extension provides resilience — even if the origin is down, stale content can be served for the defined window:

Cache-Control: max-age=60, stale-while-revalidate=86400, stale-if-error=86400

Cache Invalidation Strategies

Time-Based Invalidation

# Fixed TTL
CACHE_TTL = {
    'user_profile': 3600,      # 1 hour
    'product_list': 300,        # 5 minutes
    'config': 86400,           # 24 hours
    'analytics': 60,            # 1 minute
}

def get_cached(key, ttl_key, fetcher):
    cached = redis.get(key)
    if cached:
        return json.loads(cached)
    
    value = fetcher()
    redis.setex(key, CACHE_TTL[ttl_key], json.dumps(value))
    return value

Event-Based Invalidation

# Invalidate on data changes
def on_product_updated(product_id):
    # Invalidate specific cache
    redis.delete(f'product:{product_id}')
    
    # Invalidate list caches
    for pattern in ['products:list:*', 'products:category:*']:
        for key in redis.keys(pattern):
            redis.delete(key)
    
    # Notify other services
    redis.publish('cache:invalidate', {
        'type': 'product',
        'id': product_id
    })

Stale-While-Revalidate

# Serve stale content while refreshing in background
def get_product_with_revalidation(product_id):
    cache_key = f'product:{product_id}'
    
    # Try to get from cache
    cached = redis.get(cache_key)
    
    if cached:
        product = json.loads(cached)
        
        # Check if stale
        is_stale = redis.ttl(cache_key) < 60
        
        if is_stale:
            # Schedule background refresh (don't block)
            asyncio.create_task(refresh_product_cache(product_id))
        
        return product
    
    # Cache miss - fetch synchronously
    return refresh_product_cache(product_id)

async def refresh_product_cache(product_id):
    product = db.get_product(product_id)
    
    # Update cache
    redis.setex(f'product:{product_id}', 3600, json.dumps(product))
    
    return product

Probabilistic Early Expiration

Instead of refreshing at a fixed TTL threshold, some requests probabilistically refresh early — preventing a stampede when all requests detect staleness simultaneously:

import random
import math

def should_refresh_early(ttl_remaining, total_ttl):
    if ttl_remaining <= 0:
        return True
    # Probability increases as TTL approaches zero
    prob = math.exp(-ttl_remaining / (total_ttl * 0.5))
    return random.random() < prob

def get_with_probabilistic_refresh(key, ttl=3600, fetch_fn=None):
    cached = redis.get(key)
    if cached:
        ttl_remaining = redis.ttl(key)
        if should_refresh_early(ttl_remaining, ttl):
            # Only some requests refresh, preventing stampede
            asyncio.create_task(refresh_async(key, ttl, fetch_fn))
        return json.loads(cached)
    
    value = fetch_fn()
    redis.setex(key, ttl, json.dumps(value))
    return value

Reddit and Facebook use similar probabilistic approaches to protect hot keys.

Cache Stampede Prevention

A cache stampede (also called thundering herd) occurs when a popular key expires and thousands of requests simultaneously hit the origin. Three proven defenses:

1. Distributed Lock / Mutex

Only one request fetches from origin; others wait or read stale:

def get_with_lock(key, fetch_fn, ttl=3600):
    cached = redis.get(key)
    if cached:
        return json.loads(cached)
    
    # Acquire a distributed lock — only one request proceeds
    lock_key = f'lock:{key}'
    lock = redis.lock(lock_key, timeout=10, blocking_timeout=5)
    
    if lock.acquire():
        try:
            # Double-check — another request may have populated it
            cached = redis.get(key)
            if cached:
                return json.loads(cached)
            
            value = fetch_fn()
            redis.setex(key, ttl, json.dumps(value))
            return value
        finally:
            lock.release()
    else:
        # Another request is fetching — wait briefly, then retry
        time.sleep(0.05)
        return get_with_lock(key, fetch_fn, ttl)

2. Single-Flight Pattern

Only allow one in-flight request per key at any time:

from collections import defaultdict
import asyncio

class SingleFlightCache:
    def __init__(self):
        self._in_flight = defaultdict(list)
    
    async def get(self, key, fetch_fn, ttl=3600):
        cached = redis.get(key)
        if cached:
            return json.loads(cached)
        
        # If another request is already fetching, wait for it
        if key in self._in_flight:
            future = asyncio.Future()
            self._in_flight[key].append(future)
            return await future
        
        # We are the designated fetcher
        self._in_flight[key] = []
        try:
            value = fetch_fn()
            redis.setex(key, ttl, json.dumps(value))
            # Notify waiters
            for future in self._in_flight[key]:
                future.set_result(value)
            return value
        finally:
            del self._in_flight[key]

3. Probabilistic Early Expiration (covered above)

AI and Vector Caching

AI workloads introduce new caching challenges and opportunities. LLM inference is expensive — caching model outputs can reduce latency by 10-100x and cut API costs by 90%.

Semantic Caching

Unlike traditional caching (which requires exact key matches), semantic caching reuses answers for semantically equivalent questions:

import numpy as np
from redis.commands.search.query import Query

# Store embeddings in Redis for semantic matching
def semantic_cache_get(query_embedding, threshold=0.92):
    # Redis vector search for semantically similar cached queries
    result = redis.ft('cache_index').search(
        Query('*=>[KNN 1 @embedding $vec AS score]')
        .sort_by('score')
        .return_fields('response', 'score')
        .dialect(2),
        {'vec': np.array(query_embedding, dtype=np.float32).tobytes()}
    )
    
    if result.total > 0 and (1 - result.docs[0].score) >= threshold:
        return result.docs[0].response
    return None

def semantic_cache_set(query, response, embedding, ttl=3600):
    redis.hset(f'semantic:{hash(query)}', mapping={
        'query': query,
        'response': response,
        'embedding': np.array(embedding, dtype=np.float32).tobytes()
    })
    redis.expire(f'semantic:{query}', ttl)

Questions like “How do I reset my password?” and “Can I change my login credentials?” map to the same cache entry. Companies using semantic caching report up to 15x faster responses and 90% lower LLM costs.

Vector Cache for RAG Pipelines

In Retrieval-Augmented Generation, embedding vectors are retrieved from a vector store for each query. Caching these results avoids redundant similarity searches:

# Cache vector search results for common queries
def get_rag_context(query, top_k=5):
    # Check semantic cache first
    cached = semantic_cache_get(query_embedding(query))
    if cached:
        return cached
    
    # Perform vector search
    results = vector_store.similarity_search(query, k=top_k)
    
    # Cache for reuse
    semantic_cache_set(query, results, query_embedding(query))
    return results

Redis 8.x ships native Vector Sets with HNSW indexing, achieving 12,400 QPS on 1M vector benchmarks. Valkey’s community module (Valkey-Search) runs at roughly 9,800 QPS — a gap that Valkey 9.0 targets to close with SIMD acceleration in late 2026.

Model Inference Caching

For deterministic model outputs (factual Q&A, code generation with fixed inputs), cache the exact input-output pair:

def llm_inference_cached(prompt, model="gpt-4"):
    prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
    cache_key = f'llm:{model}:{prompt_hash}'
    
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Expensive inference call
    response = call_llm_api(prompt, model)
    
    # Cache for 1 hour — adjust TTL based on prompt volatility
    redis.setex(cache_key, 3600, json.dumps(response))
    return response

For time-sensitive queries (“What’s the weather today?”), use short TTLs or skip caching entirely. Use exact-match caching paired with semantic caching for maximum coverage.

Token and KV Cache Reuse

LLM serving frameworks (vLLM, TGI) maintain a KV cache — intermediate attention states for generated tokens. While typically managed internally, some providers now expose prompt caching APIs:

# OpenAI prompt caching — reuse system prompt embeddings
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input}
    ],
    # System prompt is cached between requests
)

Structure prompts to maximize reuse: keep a consistent system prompt and label it cacheable if the provider supports it.

Proactive Cache Refresh and Pre-Warming

Instead of waiting for a cache miss, proactively populate the cache before traffic arrives:

# Cache warming at deployment time
def warm_cache_for_deployment():
    hot_keys = [
        'config:feature_flags',
        'products:popular',
        'categories:top_level',
    ]
    for key in hot_keys:
        value = fetch_from_database(key)
        redis.setex(key, 3600, json.dumps(value))

# Periodic refresh-ahead for critical data
def refresh_ahead_loop():
    while True:
        for key in ['config:feature_flags', 'pricing:tiers']:
            ttl = redis.ttl(key)
            if ttl < 300:  # Refresh within 5 minutes of expiry
                value = fetch_from_database(key)
                redis.setex(key, 3600, json.dumps(value))
        time.sleep(60)

Redis Data Integration (RDI) takes this further by subscribing to database change events and synchronizing Redis in near real-time — eliminating TTL-related staleness entirely.

Cache Monitoring

Key Metrics

# Prometheus Redis exporter configuration
- name: redis
  rules:
    - alert: CacheHitRateLow
      expr: |
        (redis_keyspace_hits_total / 
        (redis_keyspace_hits_total + redis_keyspace_misses_total)) < 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Cache hit rate below 80%"
        
    - alert: RedisMemoryHigh
      expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Redis memory usage above 90%"
        
    - alert: RedisEvictionsHigh
      expr: rate(redis_evicted_keys_total[5m]) > 10
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High number of key evictions"

Cache Analytics

# Cache hit/miss tracking
class CacheMetrics:
    def __init__(self):
        self.hits = 0
        self.misses = 0
        self.errors = 0
    
    def record_hit(self):
        self.hits += 1
    
    def record_miss(self):
        self.misses += 1
    
    @property
    def hit_rate(self):
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0

metrics = CacheMetrics()

def get_cached(key):
    try:
        value = redis.get(key)
        if value:
            metrics.record_hit()
            return json.loads(value)
        metrics.record_miss()
        return None
    except Exception:
        metrics.errors += 1
        return None

Cache Technology Comparison

Feature Valkey 8.1 Redis 8.2 Memcached
License BSD-3-Clause BSLLv1 + AGPLv3 BSD
Data structures Strings, Hashes, Lists, Sets, Sorted Sets, Streams, Bitmaps + Vector Sets, JSON, TimeSeries (modules) Strings only
Persistence RDB + AOF RDB + AOF No
Clustering Valkey Cluster (16,384 slots) Redis Cluster Client-side hashing
I/O threading Multi-threaded + TLS offload Single-threaded + I/O threads Multi-threaded
Pub/Sub Yes Yes No
Lua scripts Yes Yes + WebAssembly (2.0) No
Vector search Module (9,800 QPS) Native HNSW (12,400 QPS) No
Memory efficiency Best (-20% vs Redis 7.2) Moderate Higher
Performance 1.20M ops/sec 1.11M ops/sec Extremely fast (simple)
Eviction policies LRU, LFU, TTL, Random, etc. LRU, LFU, TTL, Random, etc. LRU only
AWS cost (t4g.medium) $0.052/hr $0.065/hr $0.035/hr
Use case General caching, session stores, pub/sub Rich data, AI vectors, JSON, enterprise Simple key-value, high throughput
# Memcached is simpler - no complex data types
import pymemcache

mc = pymemcache.Client(('localhost', 11211))

# Simple string caching only
mc.set('key', 'value', expire=3600)
value = mc.get('key')

Best Practices

Cache Security

Caches often hold sensitive data. Apply these security measures:

import hashlib

# Encrypt sensitive values before caching
def cache_sensitive(key, value, ttl=3600):
    encrypted = encrypt(json.dumps(value), SECRET_KEY)
    redis.setex(key, ttl, encrypted)

# Use Redis 6+ ACLs for fine-grained access
# ACL: user caching on ~cache:* +@read +@write -@admin >secure_password

Enable Redis AUTH, use TLS for in-transit encryption (Redis 6+), and isolate tenant data with key prefixes or separate Redis databases.

Cache Anti-Patterns

Don’t cache too much data:

# Bad: Caching entire database rows
user = cache.get(f'user:{user_id}')  # Contains password hash!

# Good: Cache only what's needed
user_summary = cache.get(f'user:{user_id}:summary')

Don’t use indefinite TTLs:

# Bad: No expiration
redis.set('config', json.dumps(config))

# Good: Reasonable TTL with refresh
redis.setex('config', 3600, json.dumps(config))

Don’t over-cache fast queries:

# Bad: Caching a simple primary-key lookup (already fast)
product = cache.get(f'product:{pid}')  # DB query was 2ms

# Good: Cache expensive joins and aggregations
sales_report = cache.get('sales:q2:report')  # DB query was 500ms

Avoid hot keys — a single key receiving disproportionate traffic creates a bottleneck. Distribute hot keys with hash tags or shard across multiple keys:

# Bad: Single key for all active users
redis.zadd('active_users', {user_id: time.time()})

# Good: Shard by hash of user_id (e.g., 16 shards)
shard = hash(user_id) % 16
redis.zadd(f'active_users:{shard}', {user_id: time.time()})

Handle cache misses gracefully with request coalescing:

# Bad: Cache stampede
for i in range(100):  # 100 simultaneous requests
    user = get_user(123)  # All hit DB simultaneously

# Good: Request coalescing
lock = redis.lock('user:123:lock', timeout=5)
if lock.acquire():
    try:
        user = get_user(123)
    finally:
        lock.release()
else:
    # Wait for other request to populate cache
    time.sleep(0.1)
    user = get_user(123)

Set memory alerts early — configure alerts at 70% memory usage, not 90%. Memory pressure causes latency spikes before visible degradation:

- alert: RedisMemoryWarning
  expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.7
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis memory usage above 70%"

Eviction Policy Selection Guide

Access Pattern Recommended Policy Rationale
Temporal bursts allkeys-lru Recent items accessed often, then not
Stable hot set allkeys-lfu Frequently accessed items stay cached
Mixed (TTL + permanent) volatile-lru Only evicts keys with TTL set
Critical no-eviction noeviction Cache treated as database — fail writes instead

Run redis-cli --latency-history periodically to detect memory-induced latency spikes before they affect users.

Conclusion

Effective caching requires strategy layered across the entire stack:

  • Edge: CDN for all static content and stale-while-revalidate for dynamic APIs
  • In-memory: Valkey/Redis for session stores, query caching, and pub/sub
  • Application: Local in-memory cache for hot data with short TTLs
  • Invalidation: Event-driven invalidation, TTL-based expiry, and cache tags
  • Monitoring: Track hit rates per layer, eviction rates, and memory pressure

Start with CDN and cache-aside. Add Valkey for session and query caching when you outgrow a single server. Layer in write-through for critical data, semantic caching for AI workloads, and stale-while-revalidate for resilience. Monitor everything, set alerts early, and adjust TTLs based on real usage patterns.

Resources

Comments

👍 Was this article helpful?