Skip to main content
⚡ Calmops

Database Indexing Strategies: A Complete Guide

Introduction

n an unindexed table can return in milliseconds with the right index. But indexes also have costs — they slow down writes and consume disk space. This guide covers when and how to use each index type effectively.

How Indexes Work

Without an index, the database performs a full table scan — reading every row to find matches. With an index, it uses a data structure (usually a B-tree) to jump directly to matching rows.

-- Without index: scans all 10 million rows
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 8.3 seconds

-- With index on user_id: jumps directly to matching rows
CREATE INDEX idx_orders_user_id ON orders(user_id);
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 0.8 milliseconds

B-Tree Index (Default)

B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and most databases. It supports equality, range, prefix, and sort operations.

-- Basic B-tree index
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at DESC);

-- Verify with EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Should show "Index Scan" not "Seq Scan"

What B-Tree Supports

-- Equality
WHERE email = '[email protected]'

-- Range
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
WHERE age > 18

-- Prefix (LIKE with leading constant)
WHERE name LIKE 'Ali%'    -- uses index
WHERE name LIKE '%ice'    -- does NOT use index

-- Sorting (if index matches ORDER BY)
ORD -- uses index if indexed

Composite Index

An index on multiple columns. Column order is critical — the index can only be used from the leftmost column.

-- Composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
in.html)
h: equality only, O(1)
CREATE INDEX idx_name ON table USING hash(column);

-- Drop index
DROP INDEX idx_name;

-- Analyze query
EXPLAIN ANALYZE SELECT ...;

Resources

– Composite: multi-column queries CREATE INDEX idx_name ON table(col1, col2, col3);

– Covering (PostgreSQL): include extra columns CREATE INDEX idx_name ON table(col1) INCLUDE (col2, col3);

– Partial: subset of rows CREATE INDEX idx_name ON table(column) WHERE condition;

– Unique: enforce uniqueness CREATE UNIQUE INDEX idx_name ON table(column);

– Full-text (PostgreSQL) CREATE INDEX idx_name ON table USING gin(to_tsvector(’english’, column));

– Hasty columns first** in composite indexes (columns with many distinct values) 3. Use covering indexes for frequently-run queries to eliminate table lookups 4. Use partial indexes when you only query a subset of rows 5. Don’t over-index — each index slows INSERT/UPDATE/DELETE and uses disk space 6. Monitor and drop unused indexes — they’re pure overhead 7. Test with realistic data volumes — index effectiveness changes with data size

Quick Reference

-- B-tree (default): equality, nt ON orders(tenant_id, user_id, created_at);

-- All queries filter by tenant_id first
WHERE tenant_id = 'acme' AND user_id = 42

Soft Deletes

-- Partial index excludes deleted rows
CREATE INDEX idx_users_email ON users(email)
WHERE deleted_at IS NULL;

-- Queries automatically use this smaller index
WHERE email = '[email protected]' AND deleted_at IS NULL;

Index Design Principles

  1. Index columns used in WHERE, JOIN, and ORDER BY — these are the most impactful
  2. **Put high-cardinali00;

– Use keyset pagination (fast regardless of page): SELECT * FROM posts WHERE id < :last_seen_id ORDER BY id DESC LIMIT 20;


### Time-Series Queries

```sql
-- Index for recent data queries
CREATE INDEX idx_events_time ON events(created_at DESC);

-- Partial index for recent data only
CREATE INDEX idx_events_recent ON events(user_id, created_at)
WHERE created_at > NOW() - INTERVAL '30 days';

Multi-Tenant Applications

-- Always include tenant_id as the first column
CREATE INDEX idx_orders_tenaexes (PostgreSQL)

```sql
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_%'
ORDER BY schemaname, tablename;

Unused indexes waste disk space and slow down writes — drop them.

Indexing Patterns for Common Queries

Pagination

-- Efficient pagination with keyset (cursor-based)
CREATE INDEX idx_posts_id ON posts(id DESC);

-- Instead of OFFSET (slow for large offsets):
SELECT * FROM posts ORDER BY id DESC LIMIT 20 OFFSET 100quential Scans (PostgreSQL)

```sql
SELECT schemaname, tablename, seq_scan, seq_tup_read,
       idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;

Index Maintenance

Rebuild Fragmented Indexes

-- PostgreSQL: REINDEX
REINDEX INDEX idx_orders_user_id;
REINDEX TABLE orders;  -- rebuild all indexes on table

-- MySQL: OPTIMIZE TABLE
OPTIMIZE TABLE orders;

-- PostgreSQL: VACUUM ANALYZE (updates statistics)
VACUUM ANALYZE orders;

Find Unused Inddth=100)

Index Cond: (user_id = 42)


## Finding Missing Indexes

### Slow Query Log (MySQL)

```sql
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- log queries > 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';

pg_stat_statements (PostgreSQL)

-- Find slowest queries
SELECT query, calls, total_exec_time / calls AS avg_ms,
       rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Check for SeALYZE, FORMAT JSON) SELECT * FROM orders WHERE user_id = 42;


### Reading EXPLAIN Output

Seq Scan on orders (cost=0.00..45000.00 rows=1000 width=100) Filter: (user_id = 42)


- `Seq Scan` = full table scan (bad for large tables)
- `Index Scan` = using an index (good)
- `Index Only Scan` = covering index (best)
- `cost=start..total` = estimated cost units
- `rows=N` = estimated rows returned

```sql
-- After adding index:
Index Scan using idx_orders_user_id on orders
  (cost=0.43..8.45 rows=10 wi users USING hash(email);

-- Good for: exact equality only
WHERE email = '[email protected]'

-- Does NOT support:
WHERE email LIKE 'alice%'
WHERE email > '[email protected]'
ORDER BY email

Analyzing Index Usage with EXPLAIN

EXPLAIN ANALYZE is your primary tool for understanding query performance:

-- Basic EXPLAIN
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- With actual execution stats
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

-- JSON format (more detail)
EXPLAIN (ANy('english', 'machine learning')) AS rank
FROM posts
WHERE to_tsvector('english', title || ' ' || body)
      @@ plainto_tsquery('english', 'machine learning')
ORDER BY rank DESC;

-- MySQL: FULLTEXT index
CREATE FULLTEXT INDEX idx_posts_ft ON posts(title, body);

SELECT * FROM posts
WHERE MATCH(title, body) AGAINST('machine learning' IN BOOLEAN MODE);

Hash Index

O(1) equality lookups — faster than B-tree for exact matches, but no range support:

-- PostgreSQL
CREATE INDEX idx_users_email_hash ONqueness while providing fast lookups:

```sql
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Or as a constraint (equivalent)
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

Full-Text Index

For searching within text content:

-- PostgreSQL: GIN index with tsvector
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || body));

-- Search query
SELECT title, ts_rank(to_tsvector('english', title || ' ' || body),
                      plainto_tsquered:

```sql
-- Index only active users (not deleted ones)
CREATE INDEX idx_users_active ON users(email)
WHERE deleted_at IS NULL;

-- Index only recent orders
CREATE INDEX idx_orders_recent ON orders(user_id, created_at)
WHERE created_at > '2025-01-01';

-- Index only published posts
CREATE INDEX idx_posts_published ON posts(created_at)
WHERE status = 'published';

Partial indexes are much smaller than full indexes and can be faster for queries that match the partial condition.

Unique Index

Enforces unidatabase answers the query entirely from the index without touching the table:

-- PostgreSQL: INCLUDE adds non-key columns to the index
CREATE INDEX idx_orders_cover ON orders(user_id, created_at)
INCLUDE (total, status);

-- This query is answered entirely from the index
SELECT user_id, created_at, total, status
FROM orders
WHERE user_id = 42
ORDER BY created_at DESC;
-- EXPLAIN shows "Index Only Scan" — no table access!

Partial Index

Index only a subset of rows — smaller, faster, and more target – Uses index (leftmost prefix) WHERE user_id = 42 WHERE user_id = 42 AND created_at > ‘2026-01-01’

– Does NOT use index (skips user_id) WHERE created_at > ‘2026-01-01’


### Covering Index

A covering index includes all columns needed by a query — the 

Comments