Introduction
Database performance is critical for application responsiveness. Slow databases frustrate users and limit scalability. This guide covers comprehensive optimization techniques for MySQL and PostgreSQL, the two most popular relational databases.
Performance Fundamentals
Key Metrics
- Query latency: Response time
- Throughput: Queries per second
- Connections: Active connections
- CPU usage: Processing load
- I/O wait: Disk bottleneck
- Memory usage: Buffer pool efficiency
Performance Methodology
- Measure: Identify slow queries
- Analyze: Understand bottlenecks
- Optimize: Apply fixes
- Monitor: Verify improvements
Query Optimization
EXPLAIN Analysis
MySQL
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;
PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 123
AND status = 'completed';
Identifying Slow Queries
MySQL Slow Query Log
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
PostgreSQL
-- Enable logging
ALTER SYSTEM SET log_min_duration_statement = 1000;
ALTER SYSTEM SET log_statement = 'all';
Indexing Strategies
When to Index
Index columns that are:
- Used in WHERE clauses
- Part of JOIN conditions
- In ORDER BY
- In GROUP BY
- In DISTINCT
Index Types
B-Tree (Default)
-- MySQL
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- PostgreSQL
CREATE INDEX idx_user_email ON users USING btree(email);
Composite Indexes
-- Composite index - order matters
CREATE INDEX idx_orders_status_date ON orders(status, created_at);
-- MySQL: Use for queries filtering both columns
-- PostgreSQL: Can use index on leading column
Partial Indexes (PostgreSQL)
CREATE INDEX idx_orders_active ON orders(user_id)
WHERE status = 'active';
Covering Indexes
-- MySQL
CREATE INDEX idx_users_covering ON users(email, name, created_at)
INCLUDE (phone);
-- PostgreSQL
CREATE INDEX idx_users_covering ON users(email)
INCLUDE (name, phone);
Index Anti-Patterns
- Too many indexes (slows writes)
- Low selectivity indexes
- Indexing columns in functions
- Not considering query patterns
MySQL Optimization
Configuration Tuning
[mysqld]
# Buffer pool - set to 70-80% of RAM
innodb_buffer_pool_size = 4G
# Log files
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
# Connections
max_connections = 200
# Query cache (MySQL 8.0 removed this)
# Use in 5.7: query_cache_type = 1
# Temp tables
tmp_table_size = 256M
max_heap_table_size = 256M
# Slow query log
slow_query_log = 1
long_query_time = 2
InnoDB Best Practices
-- Use consistent reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Proper primary keys
-- Always use auto_increment or meaningful composite keys
-- Avoid SELECT *
SELECT id, name FROM users WHERE id = 1;
Query Patterns
-- Good: Use indexed columns first
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';
-- Bad: Function on indexed column (can't use index)
SELECT * FROM orders WHERE YEAR(created_at) = 2025;
-- Better: Range query
SELECT * FROM orders WHERE created_at >= '2025-01-01'
AND created_at < '2026-01-01';
PostgreSQL Optimization
Configuration Tuning
# Memory
shared_buffers = 2GB # 25% of RAM
effective_cache_size = 6GB # 75% of RAM
work_mem = 64MB
maintenance_work_mem = 512MB
# WAL
wal_buffers = 16MB
checkpoint_completion_target = 0.9
# Connections
max_connections = 200
# Query planner
random_page_cost = 1.1 # For SSDs
effective_io_concurrency = 200
# Logging
log_min_duration_statement = 1000
Query Optimization
-- Analyze for planner hints
ANALYZE;
-- Force specific index
SELECT /*+ Index(orders idx_orders_user_id) */ *
FROM orders WHERE user_id = 123;
-- Use CTEs for readability
WITH recent_orders AS (
SELECT * FROM orders
WHERE created_at > '2025-01-01'
)
SELECT u.name, ro.order_count
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) as order_count
FROM recent_orders
GROUP BY user_id
) ro ON u.id = ro.user_id;
Advanced Features
Materialized Views
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
date_trunc('month', created_at) as month,
SUM(total) as revenue,
COUNT(*) as order_count
FROM orders
GROUP BY date_trunc('month', created_at);
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;
Partitioning
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT,
total DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
Caching Strategies
Application-Level Cache
import redis
redis_client = redis.Redis()
def get_user(user_id):
cache_key = f"user:{user_id}"
# Try cache first
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# Fetch from database
user = db.query(User).get(user_id)
# Cache for 5 minutes
if user:
redis_client.setex(
cache_key,
300,
json.dumps(user.to_dict())
)
return user
Database Query Cache
MySQL Query Cache (Deprecated in 8.0)
Use application caching instead.
PostgreSQL
-- Use prepared statements
PREPARE user_query (int) AS
SELECT * FROM users WHERE id = $1;
EXECUTE user_query(123);
Redis Caching Patterns
# Cache-aside pattern
def get_product(product_id):
cache_key = f"product:{product_id}"
# 1. Check cache
cached = redis.get(cache_key)
if cached:
return Product.from_json(cached)
# 2. Fetch from database
product = db.get(Product, product_id)
# 3. Store in cache
if product:
redis.setex(cache_key, 3600, product.to_json())
return product
Monitoring Tools
MySQL
- MySQL Enterprise Monitor
- Percona Monitoring and Management
- pt-query-digest
PostgreSQL
- pgAdmin
- pg_stat_statements
- EXPLAIN ANALYZE
- pgcenter
Universal Tools
- Datadog
- New Relic
- Prometheus + Grafana
Common Performance Issues
N+1 Query Problem
# Bad: N+1 queries
users = db.query(User).all()
for user in users:
print(user.orders.count()) # Each user triggers a query
# Good: Eager loading
users = db.query(User).options(
selectinload(User.orders)
).all()
for user in users:
print(len(user.orders))
Missing Index
-- Find missing indexes with EXPLAIN
EXPLAIN SELECT * FROM orders WHERE user_email = '[email protected]';
-- Add index
CREATE INDEX idx_orders_user_email ON orders(user_email);
Connection Pool Exhaustion
# Use connection pooling
from sqlalchemy.pool import QueuePool
engine = create_engine(
"postgresql://user:pass@localhost/db",
poolclass=QueuePool,
pool_size=10,
max_overflow=20,
pool_pre_ping=True
)
Conclusion
Database optimization is ongoing. Start with query analysis, add proper indexes, tune configuration, and implement caching. Monitor continuously and optimize based on real usage patterns.
Comments