Caching is the most impactful performance optimization available. A well-designed caching strategy can reduce latency by 10-100x and reduce load on origin systems by 90% or more. This guide covers caching at every layer - from CDN to application to database.
The Caching Pyramid
┌─────────────────────────────────────┐
│ CDN / Edge Cache │ ← Milliseconds, global
├─────────────────────────────────────┤
│ Application Cache │ ← Milliseconds
├─────────────────────────────────────┤
│ Database Query Cache │ ← Milliseconds
├─────────────────────────────────────┤
│ Primary Database │ ← Milliseconds to seconds
└─────────────────────────────────────┘
Each layer caches different data with different characteristics.
The Cache Ecosystem in 2026
The caching landscape has shifted significantly since 2024. The most consequential change is the Redis license fork that created Valkey — a Linux Foundation–governed, BSD-3-Clause fork of Redis OSS. Two years of independent development have produced measurable differences.
| Valkey 8.1 | Redis 8.2 | |
|---|---|---|
| License | BSD-3-Clause | Dual BSLLv1 + AGPLv3 |
| Throughput (1KB SET, r6g.large) | 1.20M ops/sec | 1.11M ops/sec |
| P99 latency (mixed workload) | 1.8 ms | 2.3 ms |
| Memory overhead vs Redis 7.2 | -20% | Baseline |
| I/O threading | Multi-threaded + TLS offload | Single-threaded main |
| Vector search | Module (Valkey-Search) | Native Vector Sets (HNSW) |
| Default in Linux distros | Fedora 42+, Ubuntu 26.04+, Debian 13 | None |
| AWS ElastiCache cost (t4g.medium) | $0.052/hr | $0.065/hr |
Redis 8.x has shipped aggressive feature releases — Vector Sets with HNSW indexing, hash field TTL, JSON Path queries, Functions 2.0 with WebAssembly — but the AGPL license continues to constrain commercial adoption. Snap Inc. cut caching infrastructure costs 60% by migrating 70% of clusters to ElastiCache Valkey, dropping from $2.1M/year to $840K/year while serving 5 billion daily requests.
Dragonfly has also emerged as a modern, multi-threaded, Redis-compatible alternative claiming up to 25x throughput over Redis on multi-core hardware. It runs on a single thread per core architecture with no replication lag.
For greenfield projects in 2026, Valkey is the pragmatic default for caching, session stores, and pub/sub. Stick with Redis 8 if you need native Vector Sets, JSON Path, or active-active CRDB geo-replication (Redis Enterprise). Both engines share the RESP3 wire protocol, so clients and tooling are interchangeable.
In-Memory Data Stores
Redis Data Structures
import redis
import json
r = redis.Redis(host='localhost', port=6379, db=0)
# Strings - Simple values
r.set('user:123:name', 'John Doe')
r.set('user:123:email', '[email protected]')
r.setex('session:abc123', 3600, json.dumps({'user_id': 123}))
# Hashes - Objects
r.hset('user:123', mapping={
'name': 'John Doe',
'email': '[email protected]',
'created_at': '2024-01-15'
})
user = r.hgetall('user:123')
# Lists - Ordered collections
r.lpush('recent_searches:user123', 'shoes')
r.ltrim('recent_searches:user123', 0, 9) # Keep only 10
# Sets - Unique collections
r.sadd('product:views:2024-01-15', 'product1', 'product2', 'product3')
unique_viewers = r.scard('product:views:2024-01-15')
# Sorted Sets - Leaderboards
r.zadd('leaderboard:score', {'player1': 100, 'player2': 95, 'player3': 90})
top_players = r.zrevrange('leaderboard:score', 0, 9, withscores=True)
Caching Patterns
Cache-Aside (Lazy Loading):
# Standard cache-aside pattern
def get_user(user_id):
# Try cache first
cached = redis.get(f'user:{user_id}')
if cached:
return json.loads(cached)
# Cache miss - fetch from database
user = db.query('SELECT * FROM users WHERE id = ?', user_id)
if user:
# Store in cache with TTL
redis.setex(f'user:{user_id}', 3600, json.dumps(user))
return user
def update_user(user_id, data):
# Update database first
db.execute('UPDATE users SET ... WHERE id = ?', user_id, data)
# Invalidate cache
redis.delete(f'user:{user_id}')
Write-Through:
# Write-through - cache updated on every write
def create_order(order_data):
# Write to database
order_id = db.insert('orders', order_data)
# Write to cache immediately
cache_key = f'order:{order_id}'
redis.setex(cache_key, 3600, json.dumps({**order_data, 'id': order_id}))
return order_id
def update_order(order_id, data):
db.update('orders', data, 'id = ?', order_id)
# Update cache
cache_key = f'order:{order_id}'
cached = redis.get(cache_key)
if cached:
order = json.loads(cached)
order.update(data)
redis.setex(cache_key, 3600, json.dumps(order))
Write-Behind:
# Write-behind - async database writes
import asyncio
from collections import deque
write_queue = deque()
async def update_user_async(user_id, data):
# Update cache immediately
redis.setex(f'user:{user_id}', 3600, json.dumps(data))
# Queue for async database write
write_queue.append({
'table': 'users',
'id': user_id,
'data': data
})
async def flush_writes():
while True:
if write_queue:
batch = []
while write_queue and len(batch) < 100:
batch.append(write_queue.popleft())
# Bulk write to database
db.batch_update('users', batch)
await asyncio.sleep(1)
Redis Cluster Patterns
# Redis Cluster connection
from redis.cluster import RedisCluster
nodes = [
{'host': 'redis-1', 'port': 6379},
{'host': 'redis-2', 'port': 6379},
{'host': 'redis-3', 'port': 6379},
]
rc = RedisCluster(startup_nodes=nodes, decode_responses=True)
# Automatic key distribution
rc.set('user:1:name', 'Alice')
rc.set('user:2:name', 'Bob')
# Keys automatically sharded across nodes
Redis Pub/Sub for Cache Invalidation
# Publisher - notify other services of changes
def invalidate_user_cache(user_id):
# Local cache delete
redis.delete(f'user:{user_id}')
# Notify other instances
redis.publish('cache:invalidate', json.dumps({
'key': f'user:{user_id}',
'pattern': f'user:{user_id}:*'
}))
# Subscriber - listen for invalidations
def listen_for_invalidation():
pubsub = redis.pubsub()
pubsub.subscribe('cache:invalidate')
for message in pubsub.listen():
if message['type'] == 'message':
data = json.loads(message['data'])
if '*' in data['pattern']:
# Handle pattern
for key in redis.keys(data['pattern']):
redis.delete(key)
else:
redis.delete(data['key'])
Valkey Configuration for Production
Valkey configuration is nearly identical to Redis, with a few additions for I/O threading:
# /etc/valkey/valkey.conf
io-threads 4
io-threads-do-reads yes
maxmemory 4gb
maxmemory-policy allkeys-lfu
save 900 1
save 300 10
appendonly yes
appendfsync everysec
tls-port 6379
tls-cert-file /etc/valkey/valkey.crt
tls-key-file /etc/valkey/valkey.key
Verify the installation:
$ valkey-cli INFO server | grep valkey_version
valkey_version:8.1.7
Migration from Redis 7.2 to Valkey
Migrating requires zero application code changes because Valkey speaks the identical RESP2 and RESP3 protocols:
# On AWS ElastiCache — one-click upgrade
aws elasticache modify-cache-cluster \
--cache-cluster-id my-redis-cluster \
--engine valkey \
--engine-version 8.1
# Self-hosted — export RDB and import
redis-cli --rdb dump.rdb
valkey-server --rdb-compression yes < dump.rdb
The primary operational risks are Redis Inc. proprietary modules (RediSearch, RedisJSON, RedisGraph, RedisTimeSeries, RedisBloom). Valkey ships community-module equivalents that lag by one to two releases.
Application-Level Caching
In-Memory Caching
from functools import lru_cache
from threading import Lock
import time
# Simple in-memory cache with TTL
class Cache:
def __init__(self, ttl=300):
self._cache = {}
self._timestamps = {}
self._ttl = ttl
self._lock = Lock()
def get(self, key):
with self._lock:
if key in self._cache:
if time.time() - self._timestamps[key] < self._ttl:
return self._cache[key]
del self._cache[key]
del self._timestamps[key]
return None
def set(self, key, value):
with self._lock:
self._cache[key] = value
self._timestamps[key] = time.time()
def delete(self, key):
with self._lock:
self._cache.pop(key, None)
self._timestamps.pop(key, None)
def clear(self):
with self._lock:
self._cache.clear()
self._timestamps.clear()
# Usage with LRU cache
@lru_cache(maxsize=1000)
def get_product_category(product_id):
# Expensive database query
return db.query('SELECT category FROM products WHERE id = ?', product_id)
# Manual cache management
from cachetools import TTLCache, cached
product_cache = TTLCache(maxsize=10000, ttl=300)
@cached(cache=product_cache)
def get_product(product_id):
return db.query('SELECT * FROM products WHERE id = ?', product_id)
Distributed Caching Patterns
# Multi-layer cache with local + Redis
class MultiLayerCache:
def __init__(self, local_ttl=60, remote_ttl=3600):
self.local = {} # Local dict
self.redis = redis.Redis()
self.local_ttl = local_ttl
self.remote_ttl = remote_ttl
def get(self, key):
# Check local cache first
if key in self.local:
value, timestamp = self.local[key]
if time.time() - timestamp < self.local_ttl:
return value
del self.local[key]
# Check Redis
value = self.redis.get(key)
if value:
# Populate local cache
self.local[key] = (value, time.time())
return json.loads(value)
return None
def set(self, key, value):
# Write to both layers
self.redis.setex(key, self.remote_ttl, json.dumps(value))
self.local[key] = (value, time.time())
CDN and Edge Caching
HTTP Cache Headers — The Foundation
Cache-Control is the primary mechanism for controlling how CDNs, proxies, and browsers cache your content:
| Directive | Meaning |
|---|---|
public |
Both CDN and browser can cache |
private |
Browser only (user-specific content) |
no-cache |
Cacheable but must revalidate with origin before each use |
no-store |
Never cached anywhere (sensitive data) |
max-age=N |
Browser/private cache lifetime in seconds |
s-maxage=N |
CDN/shared cache lifetime (overrides max-age) |
must-revalidate |
After expiry, origin must confirm freshness |
immutable |
Content never changes (CSS/JS with content hash in filename) |
stale-while-revalidate=N |
Serve stale content for N seconds while refreshing in background |
stale-if-error=N |
On origin error, serve stale content for up to N seconds |
Apply these patterns based on content type:
# Static assets with content hash in filename — cache forever
def get_hashed_asset(filename):
response = send_file(f'static/{filename}')
response.headers['Cache-Control'] = 'public, max-age=31536000, immutable'
return response
# HTML pages — CDN caches briefly, stale-while-revalidate for resilience
def get_page(slug):
response = render_page(slug)
response.headers['Cache-Control'] = 'public, max-age=0, s-maxage=300, stale-while-revalidate=86400'
return response
# API responses — short TTL with background refresh
def get_api_data(request):
if request.user:
response = jsonify(fetch_user_data(request.user))
response.headers['Cache-Control'] = 'private, max-age=60, must-revalidate'
else:
response = jsonify(fetch_public_data())
response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300, stale-while-revalidate=86400'
return response
# Sensitive data
def get_user_dashboard(request):
response = jsonify(request.user.dashboard)
response.headers['Cache-Control'] = 'private, no-store'
return response
Important: no-cache does NOT mean “don’t cache”. It means the content can be cached but the cache must revalidate with the origin (via If-None-Match/ETag) before serving. For true “never cache”, use no-store.
ETag and Last-Modified — Conditional Requests
ETags let the server validate cached content efficiently without transferring the full response:
from hashlib import md5
def get_user_profile(user_id):
profile = db.query('SELECT * FROM users WHERE id = ?', user_id)
profile_json = json.dumps(profile)
etag = md5(profile_json.encode()).hexdigest()
# Client sends If-None-Match header
if request.headers.get('If-None-Match') == etag:
return Response(status=304) # Not Modified — no body
response = jsonify(profile)
response.headers['ETag'] = etag
response.headers['Cache-Control'] = 'public, max-age=3600'
return response
When the content hasn’t changed, the server returns a 304 Not Modified with an empty body — saving bandwidth while keeping the cached response valid.
Vary Header — Cache Key Differentiation
response.headers['Vary'] = 'Accept-Encoding, Accept-Language'
This tells the CDN to cache separate entries for different Accept-Encoding (gzip vs br) and Accept-Language (en vs ko) values. Never use Vary: User-Agent — it creates virtually infinite cache variants and destroys the hit ratio.
CDN Cache Invalidation — Push and Tag-Based
import boto3
import time
# CloudFront invalidation
def invalidate_cdn(paths):
cloudfront = boto3.client('cloudfront')
response = cloudfront.create_invalidation(
DistributionId='E1234567890ABC',
InvalidationBatch={
'CallerReference': f'invalidation-{int(time.time())}',
'Paths': {
'Quantity': len(paths),
'Items': paths
}
}
)
return response['Invalidation']['Id']
invalidate_cdn(['/api/products/*', '/static/images/*'])
// Cloudflare cache purge by URL or tag
async function purgeCache(zoneId, apiToken, urls) {
const response = await fetch(
`https://api.cloudflare.com/client/v4/zones/${zoneId}/purge_cache`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
files: urls,
tags: ['product-update'],
prefixes: ['/api/']
})
}
);
return response.json();
}
Cache Tags (supported by Fastly, Cloudflare Enterprise, and Varnish) enable elegant bulk invalidation:
HTTP/1.1 200 OK
Cache-Tag: user-123, post-456, blog-list
When a user updates their profile, invalidate every page tagged with their user ID:
curl -X POST https://api.fastly.com/service/{id}/purge/user-123
CDN Provider Comparison
| Cloudflare | Fastly | CloudFront | Akamai | |
|---|---|---|---|---|
| PoPs | 300+ | 80+ | 600+ | 4,000+ |
| Edge compute | Workers (V8) | Compute@Edge (Wasm) | Lambda@Edge | EdgeWorkers |
| Cache invalidation | Instant | 150ms | Minutes | Minutes |
| Cache tags | Enterprise | Yes | No | Yes |
| Free tier | Generous | No | No | No |
| Price | Very cheap | Expensive | Moderate | Expensive |
| DDoS protection | Excellent | Good | Good | Excellent |
Cloudflare is the price/performance champion for most use cases. Fastly excels at instant cache purges for media and news. CloudFront is the default for deep AWS integrations.
Stale-While-Revalidate — Perceived 100% Hit Ratio
This is the most impactful caching pattern to emerge in recent years:
Traditional caching:
[cache expires] → [origin request] → [wait] → [response]
↑ user waits
Stale-While-Revalidate:
[cache expires] → [serve stale instantly] → [background refresh]
↑ user gets instant response
The sequence of events:
User 1 at 60s: [cache hit, fresh] — instant
User 2 at 61s: [cache hit, stale] — instant, background refresh starts
User 3 at 62s: [cache hit, fresh] — instant (just refreshed from bg)
Every user gets an instant response. Next.js Incremental Static Regeneration (ISR) uses this pattern internally. The stale-if-error extension provides resilience — even if the origin is down, stale content can be served for the defined window:
Cache-Control: max-age=60, stale-while-revalidate=86400, stale-if-error=86400
Cache Invalidation Strategies
Time-Based Invalidation
# Fixed TTL
CACHE_TTL = {
'user_profile': 3600, # 1 hour
'product_list': 300, # 5 minutes
'config': 86400, # 24 hours
'analytics': 60, # 1 minute
}
def get_cached(key, ttl_key, fetcher):
cached = redis.get(key)
if cached:
return json.loads(cached)
value = fetcher()
redis.setex(key, CACHE_TTL[ttl_key], json.dumps(value))
return value
Event-Based Invalidation
# Invalidate on data changes
def on_product_updated(product_id):
# Invalidate specific cache
redis.delete(f'product:{product_id}')
# Invalidate list caches
for pattern in ['products:list:*', 'products:category:*']:
for key in redis.keys(pattern):
redis.delete(key)
# Notify other services
redis.publish('cache:invalidate', {
'type': 'product',
'id': product_id
})
Stale-While-Revalidate
# Serve stale content while refreshing in background
def get_product_with_revalidation(product_id):
cache_key = f'product:{product_id}'
# Try to get from cache
cached = redis.get(cache_key)
if cached:
product = json.loads(cached)
# Check if stale
is_stale = redis.ttl(cache_key) < 60
if is_stale:
# Schedule background refresh (don't block)
asyncio.create_task(refresh_product_cache(product_id))
return product
# Cache miss - fetch synchronously
return refresh_product_cache(product_id)
async def refresh_product_cache(product_id):
product = db.get_product(product_id)
# Update cache
redis.setex(f'product:{product_id}', 3600, json.dumps(product))
return product
Probabilistic Early Expiration
Instead of refreshing at a fixed TTL threshold, some requests probabilistically refresh early — preventing a stampede when all requests detect staleness simultaneously:
import random
import math
def should_refresh_early(ttl_remaining, total_ttl):
if ttl_remaining <= 0:
return True
# Probability increases as TTL approaches zero
prob = math.exp(-ttl_remaining / (total_ttl * 0.5))
return random.random() < prob
def get_with_probabilistic_refresh(key, ttl=3600, fetch_fn=None):
cached = redis.get(key)
if cached:
ttl_remaining = redis.ttl(key)
if should_refresh_early(ttl_remaining, ttl):
# Only some requests refresh, preventing stampede
asyncio.create_task(refresh_async(key, ttl, fetch_fn))
return json.loads(cached)
value = fetch_fn()
redis.setex(key, ttl, json.dumps(value))
return value
Reddit and Facebook use similar probabilistic approaches to protect hot keys.
Cache Stampede Prevention
A cache stampede (also called thundering herd) occurs when a popular key expires and thousands of requests simultaneously hit the origin. Three proven defenses:
1. Distributed Lock / Mutex
Only one request fetches from origin; others wait or read stale:
def get_with_lock(key, fetch_fn, ttl=3600):
cached = redis.get(key)
if cached:
return json.loads(cached)
# Acquire a distributed lock — only one request proceeds
lock_key = f'lock:{key}'
lock = redis.lock(lock_key, timeout=10, blocking_timeout=5)
if lock.acquire():
try:
# Double-check — another request may have populated it
cached = redis.get(key)
if cached:
return json.loads(cached)
value = fetch_fn()
redis.setex(key, ttl, json.dumps(value))
return value
finally:
lock.release()
else:
# Another request is fetching — wait briefly, then retry
time.sleep(0.05)
return get_with_lock(key, fetch_fn, ttl)
2. Single-Flight Pattern
Only allow one in-flight request per key at any time:
from collections import defaultdict
import asyncio
class SingleFlightCache:
def __init__(self):
self._in_flight = defaultdict(list)
async def get(self, key, fetch_fn, ttl=3600):
cached = redis.get(key)
if cached:
return json.loads(cached)
# If another request is already fetching, wait for it
if key in self._in_flight:
future = asyncio.Future()
self._in_flight[key].append(future)
return await future
# We are the designated fetcher
self._in_flight[key] = []
try:
value = fetch_fn()
redis.setex(key, ttl, json.dumps(value))
# Notify waiters
for future in self._in_flight[key]:
future.set_result(value)
return value
finally:
del self._in_flight[key]
3. Probabilistic Early Expiration (covered above)
AI and Vector Caching
AI workloads introduce new caching challenges and opportunities. LLM inference is expensive — caching model outputs can reduce latency by 10-100x and cut API costs by 90%.
Semantic Caching
Unlike traditional caching (which requires exact key matches), semantic caching reuses answers for semantically equivalent questions:
import numpy as np
from redis.commands.search.query import Query
# Store embeddings in Redis for semantic matching
def semantic_cache_get(query_embedding, threshold=0.92):
# Redis vector search for semantically similar cached queries
result = redis.ft('cache_index').search(
Query('*=>[KNN 1 @embedding $vec AS score]')
.sort_by('score')
.return_fields('response', 'score')
.dialect(2),
{'vec': np.array(query_embedding, dtype=np.float32).tobytes()}
)
if result.total > 0 and (1 - result.docs[0].score) >= threshold:
return result.docs[0].response
return None
def semantic_cache_set(query, response, embedding, ttl=3600):
redis.hset(f'semantic:{hash(query)}', mapping={
'query': query,
'response': response,
'embedding': np.array(embedding, dtype=np.float32).tobytes()
})
redis.expire(f'semantic:{query}', ttl)
Questions like “How do I reset my password?” and “Can I change my login credentials?” map to the same cache entry. Companies using semantic caching report up to 15x faster responses and 90% lower LLM costs.
Vector Cache for RAG Pipelines
In Retrieval-Augmented Generation, embedding vectors are retrieved from a vector store for each query. Caching these results avoids redundant similarity searches:
# Cache vector search results for common queries
def get_rag_context(query, top_k=5):
# Check semantic cache first
cached = semantic_cache_get(query_embedding(query))
if cached:
return cached
# Perform vector search
results = vector_store.similarity_search(query, k=top_k)
# Cache for reuse
semantic_cache_set(query, results, query_embedding(query))
return results
Redis 8.x ships native Vector Sets with HNSW indexing, achieving 12,400 QPS on 1M vector benchmarks. Valkey’s community module (Valkey-Search) runs at roughly 9,800 QPS — a gap that Valkey 9.0 targets to close with SIMD acceleration in late 2026.
Model Inference Caching
For deterministic model outputs (factual Q&A, code generation with fixed inputs), cache the exact input-output pair:
def llm_inference_cached(prompt, model="gpt-4"):
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
cache_key = f'llm:{model}:{prompt_hash}'
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Expensive inference call
response = call_llm_api(prompt, model)
# Cache for 1 hour — adjust TTL based on prompt volatility
redis.setex(cache_key, 3600, json.dumps(response))
return response
For time-sensitive queries (“What’s the weather today?”), use short TTLs or skip caching entirely. Use exact-match caching paired with semantic caching for maximum coverage.
Token and KV Cache Reuse
LLM serving frameworks (vLLM, TGI) maintain a KV cache — intermediate attention states for generated tokens. While typically managed internally, some providers now expose prompt caching APIs:
# OpenAI prompt caching — reuse system prompt embeddings
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
],
# System prompt is cached between requests
)
Structure prompts to maximize reuse: keep a consistent system prompt and label it cacheable if the provider supports it.
Proactive Cache Refresh and Pre-Warming
Instead of waiting for a cache miss, proactively populate the cache before traffic arrives:
# Cache warming at deployment time
def warm_cache_for_deployment():
hot_keys = [
'config:feature_flags',
'products:popular',
'categories:top_level',
]
for key in hot_keys:
value = fetch_from_database(key)
redis.setex(key, 3600, json.dumps(value))
# Periodic refresh-ahead for critical data
def refresh_ahead_loop():
while True:
for key in ['config:feature_flags', 'pricing:tiers']:
ttl = redis.ttl(key)
if ttl < 300: # Refresh within 5 minutes of expiry
value = fetch_from_database(key)
redis.setex(key, 3600, json.dumps(value))
time.sleep(60)
Redis Data Integration (RDI) takes this further by subscribing to database change events and synchronizing Redis in near real-time — eliminating TTL-related staleness entirely.
Cache Monitoring
Key Metrics
# Prometheus Redis exporter configuration
- name: redis
rules:
- alert: CacheHitRateLow
expr: |
(redis_keyspace_hits_total /
(redis_keyspace_hits_total + redis_keyspace_misses_total)) < 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 80%"
- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage above 90%"
- alert: RedisEvictionsHigh
expr: rate(redis_evicted_keys_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High number of key evictions"
Cache Analytics
# Cache hit/miss tracking
class CacheMetrics:
def __init__(self):
self.hits = 0
self.misses = 0
self.errors = 0
def record_hit(self):
self.hits += 1
def record_miss(self):
self.misses += 1
@property
def hit_rate(self):
total = self.hits + self.misses
return self.hits / total if total > 0 else 0
metrics = CacheMetrics()
def get_cached(key):
try:
value = redis.get(key)
if value:
metrics.record_hit()
return json.loads(value)
metrics.record_miss()
return None
except Exception:
metrics.errors += 1
return None
Cache Technology Comparison
| Feature | Valkey 8.1 | Redis 8.2 | Memcached |
|---|---|---|---|
| License | BSD-3-Clause | BSLLv1 + AGPLv3 | BSD |
| Data structures | Strings, Hashes, Lists, Sets, Sorted Sets, Streams, Bitmaps | + Vector Sets, JSON, TimeSeries (modules) | Strings only |
| Persistence | RDB + AOF | RDB + AOF | No |
| Clustering | Valkey Cluster (16,384 slots) | Redis Cluster | Client-side hashing |
| I/O threading | Multi-threaded + TLS offload | Single-threaded + I/O threads | Multi-threaded |
| Pub/Sub | Yes | Yes | No |
| Lua scripts | Yes | Yes + WebAssembly (2.0) | No |
| Vector search | Module (9,800 QPS) | Native HNSW (12,400 QPS) | No |
| Memory efficiency | Best (-20% vs Redis 7.2) | Moderate | Higher |
| Performance | 1.20M ops/sec | 1.11M ops/sec | Extremely fast (simple) |
| Eviction policies | LRU, LFU, TTL, Random, etc. | LRU, LFU, TTL, Random, etc. | LRU only |
| AWS cost (t4g.medium) | $0.052/hr | $0.065/hr | $0.035/hr |
| Use case | General caching, session stores, pub/sub | Rich data, AI vectors, JSON, enterprise | Simple key-value, high throughput |
# Memcached is simpler - no complex data types
import pymemcache
mc = pymemcache.Client(('localhost', 11211))
# Simple string caching only
mc.set('key', 'value', expire=3600)
value = mc.get('key')
Best Practices
Cache Security
Caches often hold sensitive data. Apply these security measures:
import hashlib
# Encrypt sensitive values before caching
def cache_sensitive(key, value, ttl=3600):
encrypted = encrypt(json.dumps(value), SECRET_KEY)
redis.setex(key, ttl, encrypted)
# Use Redis 6+ ACLs for fine-grained access
# ACL: user caching on ~cache:* +@read +@write -@admin >secure_password
Enable Redis AUTH, use TLS for in-transit encryption (Redis 6+), and isolate tenant data with key prefixes or separate Redis databases.
Cache Anti-Patterns
Don’t cache too much data:
# Bad: Caching entire database rows
user = cache.get(f'user:{user_id}') # Contains password hash!
# Good: Cache only what's needed
user_summary = cache.get(f'user:{user_id}:summary')
Don’t use indefinite TTLs:
# Bad: No expiration
redis.set('config', json.dumps(config))
# Good: Reasonable TTL with refresh
redis.setex('config', 3600, json.dumps(config))
Don’t over-cache fast queries:
# Bad: Caching a simple primary-key lookup (already fast)
product = cache.get(f'product:{pid}') # DB query was 2ms
# Good: Cache expensive joins and aggregations
sales_report = cache.get('sales:q2:report') # DB query was 500ms
Avoid hot keys — a single key receiving disproportionate traffic creates a bottleneck. Distribute hot keys with hash tags or shard across multiple keys:
# Bad: Single key for all active users
redis.zadd('active_users', {user_id: time.time()})
# Good: Shard by hash of user_id (e.g., 16 shards)
shard = hash(user_id) % 16
redis.zadd(f'active_users:{shard}', {user_id: time.time()})
Handle cache misses gracefully with request coalescing:
# Bad: Cache stampede
for i in range(100): # 100 simultaneous requests
user = get_user(123) # All hit DB simultaneously
# Good: Request coalescing
lock = redis.lock('user:123:lock', timeout=5)
if lock.acquire():
try:
user = get_user(123)
finally:
lock.release()
else:
# Wait for other request to populate cache
time.sleep(0.1)
user = get_user(123)
Set memory alerts early — configure alerts at 70% memory usage, not 90%. Memory pressure causes latency spikes before visible degradation:
- alert: RedisMemoryWarning
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.7
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage above 70%"
Eviction Policy Selection Guide
| Access Pattern | Recommended Policy | Rationale |
|---|---|---|
| Temporal bursts | allkeys-lru |
Recent items accessed often, then not |
| Stable hot set | allkeys-lfu |
Frequently accessed items stay cached |
| Mixed (TTL + permanent) | volatile-lru |
Only evicts keys with TTL set |
| Critical no-eviction | noeviction |
Cache treated as database — fail writes instead |
Run redis-cli --latency-history periodically to detect memory-induced latency spikes before they affect users.
Conclusion
Effective caching requires strategy layered across the entire stack:
- Edge: CDN for all static content and stale-while-revalidate for dynamic APIs
- In-memory: Valkey/Redis for session stores, query caching, and pub/sub
- Application: Local in-memory cache for hot data with short TTLs
- Invalidation: Event-driven invalidation, TTL-based expiry, and cache tags
- Monitoring: Track hit rates per layer, eviction rates, and memory pressure
Start with CDN and cache-aside. Add Valkey for session and query caching when you outgrow a single server. Layer in write-through for critical data, semantic caching for AI workloads, and stale-while-revalidate for resilience. Monitor everything, set alerts early, and adjust TTLs based on real usage patterns.
Resources
- Redis Documentation — Official Redis docs, commands reference, and best practices
- Valkey Documentation — Linux Foundation Valkey project docs and migration guides
- Cache-Control MDN — HTTP caching header reference
- Cloudflare Caching Guide — CDN configuration and optimization
- Redis Cache Optimization Guide — Official Redis blog on cache tuning
- AWS ElastiCache Valkey — Managed Valkey service on AWS
Related Articles
- Frontend Performance Optimization — Client-side caching and asset optimization
- CDN and Edge Computing — Edge caching and serverless at the edge
Comments