Materialized Views precompute expensive queries and store their results, dramatically improving read performance. They’re essential for complex aggregations, reporting, and building read-optimized models in CQRS architectures.
Understanding Materialized Views
The Problem: Complex Queries
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Slow Complex Queries Without Materialized Views โ
โ โ
โ Query: "Get monthly revenue by product category" โ
โ โ
โ SELECT โ
โ p.category, โ
โ DATE_TRUNC('month', o.created_at) as month, โ
โ SUM(oi.quantity * oi.price) as revenue โ
โ FROM orders o โ
โ JOIN order_items oi ON o.id = oi.order_id โ
โ JOIN products p ON oi.product_id = p.id โ
โ WHERE o.status = 'completed' โ
โ AND o.created_at >= '2025-01-01' โ
โ GROUP BY p.category, DATE_TRUNC('month', o.created_at) โ
โ ORDER BY month, revenue DESC โ
โ โ
โ Performance Issues: โ
โ โ Joins 3+ tables โ
โ โ Scans millions of rows โ
โ โ Aggregates on-the-fly โ
โ โ Takes 10+ seconds โ
โ โ Can't use indexes effectively โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
With Materialized Views
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Fast Queries With Materialized Views โ
โ โ
โ Materialized View: monthly_revenue_by_category โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ monthly_revenue_by_category โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ category โ month โ revenue โ order_count โ โ
โ โ โโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ โ โ
โ โ Electronics โ 2026-01-01 โ 150000.00 โ 1200 โ โ
โ โ Clothing โ 2026-01-01 โ 85000.00 โ 950 โ โ
โ โ Books โ 2026-01-01 โ 42000.00 โ 600 โ โ
โ โ ... โ ... โ ... โ ... โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Query becomes: โ
โ SELECT * FROM monthly_revenue_by_category โ
โ WHERE month >= '2025-01-01' โ
โ ORDER BY month, revenue DESC โ
โ โ
โ Performance: โ
โ โ Single table scan โ
โ โ Indexed on category, month โ
โ โ Returns in milliseconds โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Database-Specific Implementations
PostgreSQL Materialized Views
-- Create materialized view
CREATE MATERIALIZED VIEW monthly_revenue_by_category AS
SELECT
p.category,
DATE_TRUNC('month', o.created_at) AS month,
SUM(oi.quantity * oi.price) AS revenue,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity) AS units_sold
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.status = 'completed'
GROUP BY p.category, DATE_TRUNC('month', o.created_at)
WITH DATA;
-- Create indexes for fast queries
CREATE INDEX idx_mv_category ON monthly_revenue_by_category(category);
CREATE INDEX idx_mv_month ON monthly_revenue_by_category(month);
CREATE UNIQUE INDEX idx_mv_category_month
ON monthly_revenue_by_category(category, month);
-- Refresh materialized view (locks reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue_by_category;
-- Schedule refresh (cron job or internal scheduler)
-- 0 2 * * * psql -c "REFRESH MATERIALIZED VIEW monthly_revenue_by_category" mydb
MySQL Equivalent (Using Tables)
-- MySQL doesn't have native materialized views, simulate with tables
CREATE TABLE monthly_revenue_by_category (
id INT AUTO_INCREMENT PRIMARY KEY,
category VARCHAR(100) NOT NULL,
month DATE NOT NULL,
revenue DECIMAL(15, 2) NOT NULL DEFAULT 0,
order_count INT NOT NULL DEFAULT 0,
units_sold INT NOT NULL DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY uk_category_month (category, month),
INDEX idx_month (month),
INDEX idx_category (category)
);
-- Stored procedure to refresh
DELIMITER //
CREATE PROCEDURE refresh_monthly_revenue()
BEGIN
TRUNCATE TABLE monthly_revenue_by_category;
INSERT INTO monthly_revenue_by_category (category, month, revenue, order_count, units_sold)
SELECT
p.category,
DATE_FORMAT(o.created_at, '%Y-%m-01') AS month,
SUM(oi.quantity * oi.price) AS revenue,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity) AS units_sold
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.status = 'completed'
GROUP BY p.category, DATE_FORMAT(o.created_at, '%Y-%m-01');
END //
DELIMITER ;
-- Schedule with event scheduler
CREATE EVENT refresh_monthly_revenue_event
ON SCHEDULE EVERY 1 DAY
DO
CALL refresh_monthly_revenue();
MongoDB Aggregation Pipelines
// Create materialized view using $merge or $out
// Sales aggregation pipeline
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $unwind: "$items" },
{
$group: {
_id: {
category: "$items.category",
month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } }
},
revenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } },
order_count: { $addToSet: "$_id" },
units_sold: { $sum: "$items.quantity" }
}
},
{ $project: {
category: "$_id.category",
month: "$_id.month",
revenue: 1,
order_count: { $size: "$order_count" },
units_sold: 1,
_id: 0
}},
{ $merge: {
into: "monthly_revenue_by_category",
whenMatched: "replace",
whenNotMatched: "insert"
}}
])
// Query the materialized view
db.monthly_revenue_by_category.find({
category: "Electronics"
}).sort({ month: -1 })
Application-Level Materialized Views
Building Views from Events
class MaterializedViewBuilder:
def __init__(self, read_db, event_store):
self.read_db = read_db
self.events = event_store
async def rebuild_order_summary_view(self):
await self.read_db.execute("TRUNCATE TABLE order_summary")
orders = await self.events.get_all_orders()
summary = defaultdict(lambda: {
"total_orders": 0,
"total_revenue": 0,
"total_items": 0,
"categories": set()
})
for order in orders:
customer_id = order["customer_id"]
summary[customer_id]["total_orders"] += 1
summary[customer_id]["total_revenue"] += order["total_amount"]
summary[customer_id]["total_items"] += len(order["items"])
for item in order["items"]:
summary[customer_id]["categories"].add(item["category"])
for customer_id, data in summary.items():
await self.read_db.execute("""
INSERT INTO order_summary
(customer_id, total_orders, total_revenue, total_items, categories)
VALUES ($1, $2, $3, $4, $5)
""", customer_id, data["total_orders"], data["total_revenue"],
data["total_items"], list(data["categories"]))
class IncrementalViewUpdater:
def __init__(self, read_db):
self.read_db = read_db
async def handle_order_created(self, event: Event):
await self.read_db.execute("""
INSERT INTO order_summary
(customer_id, total_orders, total_revenue, total_items)
VALUES ($1, 1, $2, $3)
ON CONFLICT (customer_id) DO UPDATE SET
total_orders = order_summary.total_orders + 1,
total_revenue = order_summary.total_revenue + $2,
total_items = order_summary.total_items + $3
""", event.payload["customer_id"],
event.payload["total_amount"],
len(event.payload["items"]))
async def handle_order_cancelled(self, event: Event):
await self.read_db.execute("""
UPDATE order_summary SET
total_orders = total_orders - 1,
total_revenue = total_revenue - $2,
total_items = total_items - $3
WHERE customer_id = $1
""", event.payload["customer_id"],
event.payload["total_amount"],
len(event.payload["items"]))
Redis-Based Cached Views
import json
from redis import asyncio as aioredis
class RedisMaterializedView:
def __init__(self, redis: aioredis.Redis, ttl: int = 3600):
self.redis = redis
self.ttl = ttl
async def get_user_dashboard(self, user_id: str) -> dict:
cache_key = f"dashboard:user:{user_id}"
cached = await self.redis.get(cache_key)
if cached:
return json.loads(cached)
data = await self._compute_dashboard(user_id)
await self.redis.setex(
cache_key,
self.ttl,
json.dumps(data)
)
return data
async def _compute_dashboard(self, user_id: str) -> dict:
async with db_pool.acquire() as conn:
orders = await conn.fetch("""
SELECT COUNT(*), SUM(total_amount)
FROM orders
WHERE customer_id = $1
""", user_id)
items = await conn.fetch("""
SELECT COUNT(*), SUM(quantity)
FROM order_items oi
JOIN orders o ON oi.order_id = o.id
WHERE o.customer_id = $1
""", user_id)
return {
"total_orders": orders[0]["count"] or 0,
"total_spent": float(orders[0]["sum"] or 0),
"total_items": items[0]["count"] or 0,
"total_quantity": items[0]["sum"] or 0
}
async def invalidate_user_dashboard(self, user_id: str):
cache_key = f"dashboard:user:{user_id}"
await self.redis.delete(cache_key)
Refresh Strategies
Full Refresh
class FullRefreshStrategy:
def __init__(self, view_name: str, query: str, db_pool):
self.view_name = view_name
self.query = query
self.pool = db_pool
async def refresh(self):
async with self.pool.acquire() as conn:
async with conn.transaction():
await conn.execute(f"TRUNCATE TABLE {self.view_name}")
await conn.execute(f"INSERT INTO {self.view_name} {self.query}")
async def refresh_concurrently(self):
async with self.pool.acquire() as conn:
await conn.execute(
f"REFRESH MATERIALIZED VIEW CONCURRENTLY {self.view_name}"
)
Incremental Refresh
class IncrementalRefreshStrategy:
def __init__(self, view_name: str, db_pool, checkpoint_store):
self.view_name = view_name
self.pool = db_pool
self.checkpoints = checkpoint_store
async def refresh(self, source_table: str):
last_checkpoint = await self.checkpoints.get(
f"checkpoint_{self.view_name}"
) or "1970-01-01"
async with self.pool.acquire() as conn:
new_data = await conn.fetch(f"""
SELECT * FROM {source_table}
WHERE updated_at > $1
""", last_checkpoint)
for row in new_data:
await self._apply_change(row)
if new_data:
latest = max(row["updated_at"] for row in new_data)
await self.checkpoints.set(
f"checkpoint_{self.view_name}",
latest.isoformat()
)
async def _apply_change(self, row: dict):
if row.get("deleted"):
await self._delete_row(row)
else:
await self._upsert_row(row)
async def _upsert_row(self, row: dict):
async with self.pool.acquire() as conn:
await conn.execute(f"""
INSERT INTO {self.view_name} (...)
VALUES (...)
ON CONFLICT (...) DO UPDATE SET ...
""")
async def _delete_row(self, row: dict):
async with self.pool.acquire() as conn:
await conn.execute(f"""
DELETE FROM {self.view_name} WHERE ...
""")
Scheduled Refresh
import asyncio
from apscheduler.schedulers.asyncio import AsyncIOScheduler
class ScheduledViewRefresh:
def __init__(self, refresh_strategies: list):
self.scheduler = AsyncIOScheduler()
self.strategies = refresh_strategies
def schedule(self):
self.scheduler.add_job(
self._run_all_refreshes,
"cron",
hour=2,
minute=0
)
self.scheduler.add_job(
self._run_all_refreshes,
"interval",
hours=1
)
async def _run_all_refreshes(self):
for strategy in self.strategies:
try:
await strategy.refresh()
except Exception as e:
logger.error(f"Failed to refresh {strategy.view_name}: {e}")
Query Patterns
Time-Series Views
-- Daily active users materialized view
CREATE MATERIALIZED VIEW daily_active_users AS
SELECT
DATE_TRUNC('day', event_time) AS day,
COUNT(DISTINCT user_id) AS active_users,
COUNT(*) AS total_events
FROM user_events
GROUP BY DATE_TRUNC('day', event_time)
WITH DATA;
-- Query trends
SELECT
day,
active_users,
LAG(active_users, 7) OVER (ORDER BY day) AS prev_week_users,
(active_users - LAG(active_users, 7) OVER (ORDER BY day)) /
NULLIF(LAG(active_users, 7) OVER (ORDER BY day), 0) * 100 AS growth_pct
FROM daily_active_users
ORDER BY day DESC
LIMIT 30;
Top-N Queries
-- Top products by revenue per category
CREATE MATERIALIZED VIEW top_products_by_category AS
SELECT
p.category,
p.id AS product_id,
p.name AS product_name,
SUM(oi.quantity * oi.price) AS revenue,
RANK() OVER (PARTITION BY p.category ORDER BY SUM(oi.quantity * oi.price) DESC) AS rank
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY p.id, p.name, p.category
WITH DATA;
-- Query top 10 in each category
SELECT * FROM top_products_by_category
WHERE rank <= 10
ORDER BY category, rank;
User Profile Aggregations
-- User purchase profile
CREATE MATERIALIZED VIEW user_purchase_profile AS
SELECT
u.id AS user_id,
u.email,
COUNT(o.id) AS total_orders,
SUM(o.total_amount) AS lifetime_value,
AVG(o.total_amount) AS avg_order_value,
MAX(o.created_at) AS last_order_date,
COUNT(DISTINCT DATE_TRUNC('month', o.created_at)) AS months_active,
ARRAY_AGG(DISTINCT p.category ORDER BY p.category) AS categories_purchased
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id AND o.status = 'completed'
LEFT JOIN order_items oi ON o.id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.id
GROUP BY u.id, u.email
WITH DATA;
CREATE INDEX idx_user_profile_ltv ON user_purchase_profile(lifetime_value DESC);
CREATE INDEX idx_user_profile_last_order ON user_purchase_profile(last_order_date DESC);
Monitoring and Maintenance
View Health Monitoring
class MaterializedViewMonitor:
def __init__(self, db_pool):
self.pool = db_pool
async def get_view_stats(self, view_name: str) -> dict:
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT
pg_size_pretty(pg_total_relation_size($1)) AS size,
pg_total_relation_size($1) AS size_bytes
""", view_name)
return dict(row)
async def check_staleness(self, view_name: str, max_age_hours: int = 24) -> bool:
async with self.pool.acquire() as conn:
row = await conn.fetchrow("""
SELECT MAX(updated_at) AS last_refresh
FROM {0}
""".format(view_name))
if not row["last_refresh"]:
return True
age = datetime.utcnow() - row["last_refresh"]
return age.total_seconds() > max_age_hours * 3600
Maintenance Tasks
-- Vacuum to reclaim space
VACUUM FULL materialized_view_name;
-- Analyze for query planner
ANALYZE materialized_view_name;
-- Check for unused indexes
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE relname = 'materialized_view_name'
AND idx_scan = 0;
Best Practices
GOOD_PATTERNS = {
"index_appropriately": """
# Create indexes on materialized views
โ
Good:
CREATE INDEX mv_sales_date ON sales_by_date(sale_date);
CREATE INDEX mv_sales_region ON sales_by_region(region);
โ Bad:
# No indexes = slow reads on large views
""",
"use_concurrent_refresh": """
# Refresh without blocking reads
โ
Good:
REFRESH MATERIALIZED VIEW CONCURRENTLY my_view;
# Requires UNIQUE index on view!
โ Bad:
REFRESH MATERIALIZED VIEW my_view;
# Locks the view during refresh
""",
"schedule_appropriately": """
# Refresh based on business needs
โ
Good:
- Real-time dashboards: refresh every 5-15 min
- Daily reports: refresh once per day
- Historical analysis: refresh weekly
โ Bad:
- Refresh too frequently (overhead)
- Refresh too rarely (stale data)
"""
}
BAD_PATTERNS = {
"over_materialized": """
โ Bad:
# Materialize everything
CREATE MATERIALIZED VIEW all_orders AS SELECT * FROM orders;
# Loses benefits of relational design
# Hard to maintain
โ
Good:
# Only materialize expensive aggregations
CREATE MATERIALIZED VIEW order_stats AS
SELECT category, COUNT(*), SUM(amount) FROM orders GROUP BY category
""",
"no_refresh_strategy": """
โ Bad:
# Create view but never refresh
CREATE MATERIALIZED VIEW ...;
# Data becomes permanently stale
โ
Good:
# Always have refresh strategy
REFRESH MATERIALIZED VIEW ...;
# Schedule or trigger-based
""",
"ignore_consistency": """
โ Bad:
# Assume materialized view is always consistent
โ
Good:
# Make staleness visible
ALTER TABLE monthly_sales
ADD COLUMN refreshed_at TIMESTAMP;
UPDATE monthly_sales
SET refreshed_at = NOW();
-- Show to users: "Data as of [refreshed_at]"
"""
}
Summary
Materialized Views dramatically improve query performance:
- Precomputation - Expensive aggregations computed once, read many times
- Index Support - Can add indexes to materialized views
- Various Refresh Strategies - Full, incremental, scheduled, or triggered
- Multiple Implementations - Database-native, application-level, or Rediscached
Key considerations:
- Balance between freshness and performance
- Choose right refresh strategy for your use case
- Monitor view size and query performance
- Make staleness visible to users when needed
Materialized Views are essential for reporting, dashboards, and any read-heavy workload with complex queries.
Comments