Database Indexing Strategies: A Complete Guide

Introduction

n an unindexed table can return in milliseconds with the right index. But indexes also have costs — they slow down writes and consume disk space. This guide covers when and how to use each index type effectively.

How Indexes Work

Without an index, the database performs a full table scan — reading every row to find matches. With an index, it uses a data structure (usually a B-tree) to jump directly to matching rows.

-- Without index: scans all 10 million rows
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 8.3 seconds

-- With index on user_id: jumps directly to matching rows
CREATE INDEX idx_orders_user_id ON orders(user_id);
SELECT * FROM orders WHERE user_id = 42;
-- Execution time: 0.8 milliseconds

B-Tree Index (Default)

B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and most databases. It supports equality, range, prefix, and sort operations.

-- Basic B-tree index
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at DESC);

-- Verify with EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Should show "Index Scan" not "Seq Scan"

What B-Tree Supports

-- Equality
WHERE email = '[email protected]'

-- Range
WHERE created_at BETWEEN '2026-01-01' AND '2026-03-31'
WHERE age > 18

-- Prefix (LIKE with leading constant)
WHERE name LIKE 'Ali%'    -- uses index
WHERE name LIKE '%ice'    -- does NOT use index

-- Sorting (if index matches ORDER BY)
ORD -- uses index if indexed

Composite Index

An index on multiple columns. Column order is critical — the index can only be used from the leftmost column.

-- Composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
in.html)
h: equality only, O(1)
CREATE INDEX idx_name ON table USING hash(column);

-- Drop index
DROP INDEX idx_name;

-- Analyze query
EXPLAIN ANALYZE SELECT ...;

Resources

Use The Index, Luke — free book on SQL indexing
PostgreSQL Index Types
MySQL Index Optimization
[PostgreSQL EXPLAIN](https://www.postgresql.org/docs/current/sql-explarange, sort CREATE INDEX idx_name ON table(column);

– Composite: multi-column queries CREATE INDEX idx_name ON table(col1, col2, col3);

– Covering (PostgreSQL): include extra columns CREATE INDEX idx_name ON table(col1) INCLUDE (col2, col3);

– Partial: subset of rows CREATE INDEX idx_name ON table(column) WHERE condition;

– Unique: enforce uniqueness CREATE UNIQUE INDEX idx_name ON table(column);

– Full-text (PostgreSQL) CREATE INDEX idx_name ON table USING gin(to_tsvector(’english’, column));

– Hasty columns first** in composite indexes (columns with many distinct values) 3. Use covering indexes for frequently-run queries to eliminate table lookups 4. Use partial indexes when you only query a subset of rows 5. Don’t over-index — each index slows INSERT/UPDATE/DELETE and uses disk space 6. Monitor and drop unused indexes — they’re pure overhead 7. Test with realistic data volumes — index effectiveness changes with data size

Quick Reference

-- B-tree (default): equality, nt ON orders(tenant_id, user_id, created_at);

-- All queries filter by tenant_id first
WHERE tenant_id = 'acme' AND user_id = 42

Soft Deletes

-- Partial index excludes deleted rows
CREATE INDEX idx_users_email ON users(email)
WHERE deleted_at IS NULL;

-- Queries automatically use this smaller index
WHERE email = '[email protected]' AND deleted_at IS NULL;

Index Design Principles

Index columns used in WHERE, JOIN, and ORDER BY — these are the most impactful
**Put high-cardinali00;

– Use keyset pagination (fast regardless of page): SELECT * FROM posts WHERE id < :last_seen_id ORDER BY id DESC LIMIT 20;


### Time-Series Queries

– Index for recent data queries CREATE INDEX idx_events_time ON events(created_at DESC);

– Partial index for recent data only CREATE INDEX idx_events_recent ON events(user_id, created_at) WHERE created_at > NOW() - INTERVAL ‘30 days’;


### Multi-Tenant Applications

– Always include tenant_id as the first column CREATE INDEX idx_orders_tenaexes (PostgreSQL)

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_%'
ORDER BY schemaname, tablename;

Unused indexes waste disk space and slow down writes — drop them.

Indexing Patterns for Common Queries

Pagination

-- Efficient pagination with keyset (cursor-based)
CREATE INDEX idx_posts_id ON posts(id DESC);

-- Instead of OFFSET (slow for large offsets):
SELECT * FROM posts ORDER BY id DESC LIMIT 20 OFFSET 100quential Scans (PostgreSQL)

SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC;


## Index Maintenance

### Rebuild Fragmented Indexes

– PostgreSQL: REINDEX REINDEX INDEX idx_orders_user_id; REINDEX TABLE orders; – rebuild all indexes on table

– MySQL: OPTIMIZE TABLE OPTIMIZE TABLE orders;

– PostgreSQL: VACUUM ANALYZE (updates statistics) VACUUM ANALYZE orders;


### Find Unused Inddth=100)
  Index Cond: (user_id = 42)

Finding Missing Indexes

Slow Query Log (MySQL)

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- log queries > 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';

pg_stat_statements (PostgreSQL)

-- Find slowest queries
SELECT query, calls, total_exec_time / calls AS avg_ms,
       rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Check for SeALYZE, FORMAT JSON) SELECT * FROM orders WHERE user_id = 42;


### Reading EXPLAIN Output

Seq Scan on orders (cost=0.00..45000.00 rows=1000 width=100) Filter: (user_id = 42)


- `Seq Scan` = full table scan (bad for large tables)
- `Index Scan` = using an index (good)
- `Index Only Scan` = covering index (best)
- `cost=start..total` = estimated cost units
- `rows=N` = estimated rows returned

– After adding index: Index Scan using idx_orders_user_id on orders (cost=0.43..8.45 rows=10 wi users USING hash(email);

– Good for: exact equality only WHERE email = ‘[email protected]’

– Does NOT support: WHERE email LIKE ‘alice%’ WHERE email > ‘[email protected]’ ORDER BY email


## Analyzing Index Usage with EXPLAIN

`EXPLAIN ANALYZE` is your primary tool for understanding query performance:

– Basic EXPLAIN EXPLAIN SELECT * FROM orders WHERE user_id = 42;

– With actual execution stats EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

– JSON format (more detail) EXPLAIN (ANy(’english’, ‘machine learning’)) AS rank FROM posts WHERE to_tsvector(’english’, title || ’ ’ || body) @@ plainto_tsquery(’english’, ‘machine learning’) ORDER BY rank DESC;

– MySQL: FULLTEXT index CREATE FULLTEXT INDEX idx_posts_ft ON posts(title, body);

SELECT * FROM posts WHERE MATCH(title, body) AGAINST(‘machine learning’ IN BOOLEAN MODE);


## Hash Index

O(1) equality lookups — faster than B-tree for exact matches, but no range support:

– PostgreSQL CREATE INDEX idx_users_email_hash ONqueness while providing fast lookups:

CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Or as a constraint (equivalent)
ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

Full-Text Index

For searching within text content:

-- PostgreSQL: GIN index with tsvector
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || body));

-- Search query
SELECT title, ts_rank(to_tsvector('english', title || ' ' || body),
                      plainto_tsquered:

– Index only active users (not deleted ones) CREATE INDEX idx_users_active ON users(email) WHERE deleted_at IS NULL;

– Index only recent orders CREATE INDEX idx_orders_recent ON orders(user_id, created_at) WHERE created_at > ‘2025-01-01’;

– Index only published posts CREATE INDEX idx_posts_published ON posts(created_at) WHERE status = ‘published’;

Partial indexes are much smaller than full indexes and can be faster for queries that match the partial condition.

## Unique Index

Enforces unidatabase answers the query entirely from the index without touching the table:

– PostgreSQL: INCLUDE adds non-key columns to the index CREATE INDEX idx_orders_cover ON orders(user_id, created_at) INCLUDE (total, status);

– This query is answered entirely from the index SELECT user_id, created_at, total, status FROM orders WHERE user_id = 42 ORDER BY created_at DESC; – EXPLAIN shows “Index Only Scan” — no table access!

## Partial Index

Index only a subset of rows — smaller, faster, and more target
-- Uses index (leftmost prefix)
WHERE user_id = 42
WHERE user_id = 42 AND created_at > '2026-01-01'

-- Does NOT use index (skips user_id)
WHERE created_at > '2026-01-01'

Covering Index

A covering index includes all columns needed by a query — the

Database Indexing Strategies: A Complete Guide

Introduction

How Indexes Work

B-Tree Index (Default)

What B-Tree Supports

Composite Index

Resources

Quick Reference

Soft Deletes

Index Design Principles

Indexing Patterns for Common Queries

Finding Missing Indexes

Slow Query Log (MySQL)

pg_stat_statements (PostgreSQL)

Check for SeALYZE, FORMAT JSON) SELECT * FROM orders WHERE user_id = 42;

Full-Text Index

Covering Index

Comments

Share this article

👍 Was this article helpful?