Skip to main content
โšก Calmops

Database Performance Optimization: MySQL and PostgreSQL

Introduction

Database performance is critical for application responsiveness. Slow databases frustrate users and limit scalability. This guide covers comprehensive optimization techniques for MySQL and PostgreSQL, the two most popular relational databases.

Performance Fundamentals

Key Metrics

  • Query latency: Response time
  • Throughput: Queries per second
  • Connections: Active connections
  • CPU usage: Processing load
  • I/O wait: Disk bottleneck
  • Memory usage: Buffer pool efficiency

Performance Methodology

  1. Measure: Identify slow queries
  2. Analyze: Understand bottlenecks
  3. Optimize: Apply fixes
  4. Monitor: Verify improvements

Query Optimization

EXPLAIN Analysis

MySQL

EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;

PostgreSQL

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 123
AND status = 'completed';

Identifying Slow Queries

MySQL Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

PostgreSQL

-- Enable logging
ALTER SYSTEM SET log_min_duration_statement = 1000;
ALTER SYSTEM SET log_statement = 'all';

Indexing Strategies

When to Index

Index columns that are:

  • Used in WHERE clauses
  • Part of JOIN conditions
  • In ORDER BY
  • In GROUP BY
  • In DISTINCT

Index Types

B-Tree (Default)

-- MySQL
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- PostgreSQL
CREATE INDEX idx_user_email ON users USING btree(email);

Composite Indexes

-- Composite index - order matters
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- MySQL: Use for queries filtering both columns
-- PostgreSQL: Can use index on leading column

Partial Indexes (PostgreSQL)

CREATE INDEX idx_orders_active ON orders(user_id)
WHERE status = 'active';

Covering Indexes

-- MySQL
CREATE INDEX idx_users_covering ON users(email, name, created_at)
INCLUDE (phone);

-- PostgreSQL
CREATE INDEX idx_users_covering ON users(email) 
INCLUDE (name, phone);

Index Anti-Patterns

  • Too many indexes (slows writes)
  • Low selectivity indexes
  • Indexing columns in functions
  • Not considering query patterns

MySQL Optimization

Configuration Tuning

[mysqld]
# Buffer pool - set to 70-80% of RAM
innodb_buffer_pool_size = 4G

# Log files
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2

# Connections
max_connections = 200

# Query cache (MySQL 8.0 removed this)
# Use in 5.7: query_cache_type = 1

# Temp tables
tmp_table_size = 256M
max_heap_table_size = 256M

# Slow query log
slow_query_log = 1
long_query_time = 2

InnoDB Best Practices

-- Use consistent reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Proper primary keys
-- Always use auto_increment or meaningful composite keys

-- Avoid SELECT *
SELECT id, name FROM users WHERE id = 1;

Query Patterns

-- Good: Use indexed columns first
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';

-- Bad: Function on indexed column (can't use index)
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- Better: Range query
SELECT * FROM orders WHERE created_at >= '2025-01-01' 
AND created_at < '2026-01-01';

PostgreSQL Optimization

Configuration Tuning

# Memory
shared_buffers = 2GB          # 25% of RAM
effective_cache_size = 6GB    # 75% of RAM
work_mem = 64MB
maintenance_work_mem = 512MB

# WAL
wal_buffers = 16MB
checkpoint_completion_target = 0.9

# Connections
max_connections = 200

# Query planner
random_page_cost = 1.1         # For SSDs
effective_io_concurrency = 200

# Logging
log_min_duration_statement = 1000

Query Optimization

-- Analyze for planner hints
ANALYZE;

-- Force specific index
SELECT /*+ Index(orders idx_orders_user_id) */ *
FROM orders WHERE user_id = 123;

-- Use CTEs for readability
WITH recent_orders AS (
    SELECT * FROM orders 
    WHERE created_at > '2025-01-01'
)
SELECT u.name, ro.order_count
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) as order_count
    FROM recent_orders
    GROUP BY user_id
) ro ON u.id = ro.user_id;

Advanced Features

Materialized Views

CREATE MATERIALIZED VIEW monthly_sales AS
SELECT 
    date_trunc('month', created_at) as month,
    SUM(total) as revenue,
    COUNT(*) as order_count
FROM orders
GROUP BY date_trunc('month', created_at);

REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;

Partitioning

CREATE TABLE orders (
    id BIGSERIAL,
    user_id BIGINT,
    total DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Caching Strategies

Application-Level Cache

import redis

redis_client = redis.Redis()

def get_user(user_id):
    cache_key = f"user:{user_id}"
    
    # Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Fetch from database
    user = db.query(User).get(user_id)
    
    # Cache for 5 minutes
    if user:
        redis_client.setex(
            cache_key, 
            300, 
            json.dumps(user.to_dict())
        )
    
    return user

Database Query Cache

MySQL Query Cache (Deprecated in 8.0)

Use application caching instead.

PostgreSQL

-- Use prepared statements
PREPARE user_query (int) AS 
SELECT * FROM users WHERE id = $1;

EXECUTE user_query(123);

Redis Caching Patterns

# Cache-aside pattern
def get_product(product_id):
    cache_key = f"product:{product_id}"
    
    # 1. Check cache
    cached = redis.get(cache_key)
    if cached:
        return Product.from_json(cached)
    
    # 2. Fetch from database
    product = db.get(Product, product_id)
    
    # 3. Store in cache
    if product:
        redis.setex(cache_key, 3600, product.to_json())
    
    return product

Monitoring Tools

MySQL

  • MySQL Enterprise Monitor
  • Percona Monitoring and Management
  • pt-query-digest

PostgreSQL

  • pgAdmin
  • pg_stat_statements
  • EXPLAIN ANALYZE
  • pgcenter

Universal Tools

  • Datadog
  • New Relic
  • Prometheus + Grafana

Common Performance Issues

N+1 Query Problem

# Bad: N+1 queries
users = db.query(User).all()
for user in users:
    print(user.orders.count())  # Each user triggers a query

# Good: Eager loading
users = db.query(User).options(
    selectinload(User.orders)
).all()

for user in users:
    print(len(user.orders))

Missing Index

-- Find missing indexes with EXPLAIN
EXPLAIN SELECT * FROM orders WHERE user_email = '[email protected]';

-- Add index
CREATE INDEX idx_orders_user_email ON orders(user_email);

Connection Pool Exhaustion

# Use connection pooling
from sqlalchemy.pool import QueuePool

engine = create_engine(
    "postgresql://user:pass@localhost/db",
    poolclass=QueuePool,
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True
)

Conclusion

Database optimization is ongoing. Start with query analysis, add proper indexes, tune configuration, and implement caching. Monitor continuously and optimize based on real usage patterns.


Resources

Comments