Introduction
n an unindexed table can return in milliseconds with the right index. But indexes also have costs — they slow down writes and consume disk space. This guide covers when and how to use each index type effectively.
How Indexes Work
Without an index, the database performs a full table scan — reading every row to find matches. With an index, it uses a data structure (usually a B-tree) to jump directly to matching rows.
-- Without index: scans all 10 million rows
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 8.3 seconds
-- With index on user_id: jumps directly to matching rows
CREATE INDEX idx_orders_user_id ON orders(user_id);
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 0.8 milliseconds
B-Tree Index (Default)
B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and most databases. It supports equality, range, prefix, and sort operations.
-- Basic B-tree index
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at DESC);
-- Verify with EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Should show "Index Scan" not "Seq Scan"
What B-Tree Supports
-- Equality
WHERE email = '[email protected]'
-- Range
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
WHERE age > 18
-- Prefix (LIKE with leading constant)
WHERE name LIKE 'Ali%' -- uses index
WHERE name LIKE '%ice' -- does NOT use index
-- Sorting (if index matches ORDER BY)
ORD -- uses index if indexed
Composite Index
An index on multiple columns. Column order is critical — the index can only be used from the leftmost column.
-- Composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
in.html)
h: equality only, O(1)
CREATE INDEX idx_name ON table USING hash(column);
-- Drop index
DROP INDEX idx_name;
-- Analyze query
EXPLAIN ANALYZE SELECT ...;
Resources
- Use The Index, Luke — free book on SQL indexing
- PostgreSQL Index Types
- MySQL Index Optimization
- [PostgreSQL EXPLAIN](https://www.postgresql.org/docs/current/sql-explarange, sort CREATE INDEX idx_name ON table(column);
– Composite: multi-column queries CREATE INDEX idx_name ON table(col1, col2, col3);
– Covering (PostgreSQL): include extra columns CREATE INDEX idx_name ON table(col1) INCLUDE (col2, col3);
– Partial: subset of rows CREATE INDEX idx_name ON table(column) WHERE condition;
– Unique: enforce uniqueness CREATE UNIQUE INDEX idx_name ON table(column);
– Full-text (PostgreSQL) CREATE INDEX idx_name ON table USING gin(to_tsvector(’english’, column));
– Hasty columns first** in composite indexes (columns with many distinct values) 3. Use covering indexes for frequently-run queries to eliminate table lookups 4. Use partial indexes when you only query a subset of rows 5. Don’t over-index — each index slows INSERT/UPDATE/DELETE and uses disk space 6. Monitor and drop unused indexes — they’re pure overhead 7. Test with realistic data volumes — index effectiveness changes with data size
Quick Reference
-- B-tree (default): equality, nt ON orders(tenant_id, user_id, created_at);
-- All queries filter by tenant_id first
WHERE tenant_id = 'acme' AND user_id = 42
Soft Deletes
-- Partial index excludes deleted rows
CREATE INDEX idx_users_email ON users(email)
WHERE deleted_at IS NULL;
-- Queries automatically use this smaller index
WHERE email = '[email protected]' AND deleted_at IS NULL;
Index Design Principles
- Index columns used in WHERE, JOIN, and ORDER BY — these are the most impactful
- **Put high-cardinali00;
– Use keyset pagination (fast regardless of page): SELECT * FROM posts WHERE id < :last_seen_id ORDER BY id DESC LIMIT 20;
### Time-Series Queries
– Index for recent data queries CREATE INDEX idx_events_time ON events(created_at DESC);
– Partial index for recent data only CREATE INDEX idx_events_recent ON events(user_id, created_at) WHERE created_at > NOW() - INTERVAL ‘30 days’;
### Multi-Tenant Applications
– Always include tenant_id as the first column CREATE INDEX idx_orders_tenaexes (PostgreSQL)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_%'
ORDER BY schemaname, tablename;
Unused indexes waste disk space and slow down writes — drop them.
Indexing Patterns for Common Queries
Pagination
-- Efficient pagination with keyset (cursor-based)
CREATE INDEX idx_posts_id ON posts(id DESC);
-- Instead of OFFSET (slow for large offsets):
SELECT * FROM posts ORDER BY id DESC LIMIT 20 OFFSET 100quential Scans (PostgreSQL)
SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC;
## Index Maintenance
### Rebuild Fragmented Indexes
– PostgreSQL: REINDEX REINDEX INDEX idx_orders_user_id; REINDEX TABLE orders; – rebuild all indexes on table
– MySQL: OPTIMIZE TABLE OPTIMIZE TABLE orders;
– PostgreSQL: VACUUM ANALYZE (updates statistics) VACUUM ANALYZE orders;
### Find Unused Inddth=100)
Index Cond: (user_id = 42)
Finding Missing Indexes
Slow Query Log (MySQL)
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- log queries > 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';
pg_stat_statements (PostgreSQL)
-- Find slowest queries
SELECT query, calls, total_exec_time / calls AS avg_ms,
rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Check for SeALYZE, FORMAT JSON) SELECT * FROM orders WHERE user_id = 42;
### Reading EXPLAIN Output
Seq Scan on orders (cost=0.00..45000.00 rows=1000 width=100) Filter: (user_id = 42)
- `Seq Scan` = full table scan (bad for large tables)
- `Index Scan` = using an index (good)
- `Index Only Scan` = covering index (best)
- `cost=start..total` = estimated cost units
- `rows=N` = estimated rows returned
– After adding index: Index Scan using idx_orders_user_id on orders (cost=0.43..8.45 rows=10 wi users USING hash(email);
– Good for: exact equality only WHERE email = ‘[email protected]’
– Does NOT support: WHERE email LIKE ‘alice%’ WHERE email > ‘[email protected]’ ORDER BY email
## Analyzing Index Usage with EXPLAIN
`EXPLAIN ANALYZE` is your primary tool for understanding query performance:
– Basic EXPLAIN EXPLAIN SELECT * FROM orders WHERE user_id = 42;
– With actual execution stats EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
– JSON format (more detail) EXPLAIN (ANy(’english’, ‘machine learning’)) AS rank FROM posts WHERE to_tsvector(’english’, title || ’ ’ || body) @@ plainto_tsquery(’english’, ‘machine learning’) ORDER BY rank DESC;
– MySQL: FULLTEXT index CREATE FULLTEXT INDEX idx_posts_ft ON posts(title, body);
SELECT * FROM posts WHERE MATCH(title, body) AGAINST(‘machine learning’ IN BOOLEAN MODE);
## Hash Index
O(1) equality lookups — faster than B-tree for exact matches, but no range support:
– PostgreSQL CREATE INDEX idx_users_email_hash ONqueness while providing fast lookups:
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Or as a constraint (equivalent)
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);
Full-Text Index
For searching within text content:
-- PostgreSQL: GIN index with tsvector
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || body));
-- Search query
SELECT title, ts_rank(to_tsvector('english', title || ' ' || body),
plainto_tsquered:
– Index only active users (not deleted ones) CREATE INDEX idx_users_active ON users(email) WHERE deleted_at IS NULL;
– Index only recent orders CREATE INDEX idx_orders_recent ON orders(user_id, created_at) WHERE created_at > ‘2025-01-01’;
– Index only published posts CREATE INDEX idx_posts_published ON posts(created_at) WHERE status = ‘published’;
Partial indexes are much smaller than full indexes and can be faster for queries that match the partial condition.
## Unique Index
Enforces unidatabase answers the query entirely from the index without touching the table:
– PostgreSQL: INCLUDE adds non-key columns to the index CREATE INDEX idx_orders_cover ON orders(user_id, created_at) INCLUDE (total, status);
– This query is answered entirely from the index SELECT user_id, created_at, total, status FROM orders WHERE user_id = 42 ORDER BY created_at DESC; – EXPLAIN shows “Index Only Scan” — no table access!
## Partial Index
Index only a subset of rows — smaller, faster, and more target
-- Uses index (leftmost prefix)
WHERE user_id = 42
WHERE user_id = 42 AND created_at > '2026-01-01'
-- Does NOT use index (skips user_id)
WHERE created_at > '2026-01-01'
Covering Index
A covering index includes all columns needed by a query — the
Comments