Database Indexing Strategies: Query Optimization Techniques

Introduction

Database performance is the backbone of application responsiveness. When users experience slow queries, it cascades into poor user experience, lost revenue, and frustrated developers. Database indexing is the most powerful tool in your optimization toolkit, yet many developers only scratch the surface of what’s possible.

In this comprehensive guide, we’ll explore indexing strategies that can transform query performance from seconds to milliseconds. We’ll cover fundamental concepts, advanced techniques, and practical patterns you can apply immediately to your projects.

Understanding How Databases Read Data

The Table Scan Problem

Before understanding indexes, you must understand the problem they solve. When you execute a query without an index, the database performs a table scan—reading every single row in the table to find matching records.

Consider a table with 10 million user records:

SELECT * FROM users WHERE email = '[email protected]';

Without an index, the database reads all 10 million rows sequentially. If each row is 500 bytes and the disk can read at 100 MB/s, this takes approximately 50 seconds. This is unacceptable for production systems.

How Indexes Solve This

An index is a separate data structure that organizes a subset of your data in a way that makes lookups fast—typically O(log n) instead of O(n). Think of it like the index at the back of a book: instead of reading every page to find a topic, you consult the index and jump directly to the relevant pages.

Most databases use B-tree (balanced tree) indexes by default, which maintain sorted order and allow fast lookups, range queries, and ordering operations.

Types of Database Indexes

B-Tree Indexes

B-tree (balanced tree) is the most common index type in relational databases. It maintains data in a sorted tree structure, enabling efficient lookups, range queries, and ordered access.

-- Create a B-tree index on email column
CREATE INDEX idx_users_email ON users(email);

-- B-tree indexes support these operations efficiently
SELECT * FROM users WHERE email = '[email protected]';  -- Exact match
SELECT * FROM users WHERE age >= 18 AND age <= 35;    -- Range query
SELECT * FROM users ORDER BY created_at DESC;         -- Ordered results

B-tree indexes work best for:

Columns with high cardinality (many unique values)
Equality queries (=, !=)
Range queries (<, >, BETWEEN, LIKE ‘prefix%’)
ORDER BY clauses

Hash Indexes

Hash indexes use a hash function to map values to bucket locations. They provide O(1) lookup time for exact matches but cannot support range queries.

-- Create a hash index (MySQL/MariaDB)
CREATE INDEX idx_users_email_hash ON users(email) USING HASH;

-- PostgreSQL uses hash indexes differently
CREATE INDEX idx_users_email_hash ON users USING hash(email);

Use hash indexes when:

You only need exact match queries
The index fits entirely in memory
The column has many duplicate values (low cardinality)

GIN Indexes (Generalized Inverted Index)

GIN indexes are designed for composite data types like arrays, JSONB, and full-text search.

-- PostgreSQL: Index an array column
CREATE INDEX idx_products_tags ON products USING GIN(tags);

-- Query array contents efficiently
SELECT * FROM products WHERE tags @> '{"electronics"}';

-- Index JSONB data
CREATE INDEX idx_orders_metadata ON orders USING GIN(metadata);

-- Full-text search index
CREATE INDEX idx_articles_content 
ON articles USING GIN(to_tsvector('english', content));

GiST Indexes (Generalized Search Tree)

GiST indexes are flexible, allowing you to implement custom indexing strategies for geographical data, range types, and more.

-- Index geographical coordinates
CREATE INDEX idx_locations_geo ON locations USING GIST(coordinates);

-- Find locations within a bounding box
SELECT * FROM locations 
WHERE coordinates && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0);

-- Range types
CREATE INDEX idx_reservations_dates ON reservations USING GIST(daterange);

Bitmap Indexes

Bitmap indexes use bit arrays to represent values, excellent for columns with low cardinality in data warehouses.

-- Oracle bitmap index
CREATE BITMAP INDEX idx_sales_region ON sales(region);

-- Efficient for OLAP workloads with repeated values
SELECT COUNT(*) FROM sales WHERE region = 'US-West';

Composite Indexes

Creating Multi-Column Indexes

Composite indexes combine multiple columns into a single index structure, powerful when querying multiple columns together.

CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date DESC);

Understanding Index Column Order

Column order in composite indexes is critical. The leftmost principle states that the index can only be used effectively for conditions on the leftmost columns.

Given the index (a, b, c):

Query	Uses Index?
WHERE a = 1	Yes (full)
WHERE a = 1 AND b = 2	Yes (full)
WHERE a = 1 AND b = 2 AND c = 3	Yes (full)
WHERE b = 2	No
WHERE b = 2 AND c = 3	No
WHERE a = 1 AND c = 3	Partial (only a)

Covering Indexes

A covering index includes all columns needed by a query, allowing the database to satisfy the query entirely from the index without accessing the table.

-- Without covering index: 2 index lookups
SELECT name, email FROM users WHERE user_id = 123;

-- With covering index: 1 index lookup
CREATE INDEX idx_users_covering ON users(user_id) INCLUDE (name, email);

Indexing Strategies by Use Case

High-Cardinality String Columns

For columns with many unique values (like email, UUIDs), B-tree indexes work well:

-- Email is high cardinality
CREATE INDEX idx_users_email ON users(email);

-- Use hash for very long strings in PostgreSQL
CREATE INDEX idx_users_email_hash ON users USING hash(email);

Low-Cardinality Boolean Columns

Boolean columns have only two values—indexing them rarely helps:

-- Don't do this
CREATE INDEX idx_users_active ON users(is_active);

-- Instead, use partial indexes
CREATE INDEX idx_users_active ON users(id) WHERE is_active = true;

Time-Series Data

For timestamps, consider time-based partitioning and indexes:

-- Composite index with timestamp
CREATE INDEX idx_events_time_type 
ON events(event_time DESC, event_type);

-- Partial index for recent data (hot path)
CREATE INDEX idx_events_recent 
ON events(event_time DESC) 
WHERE event_time > NOW() - INTERVAL '30 days';

JSON Data

Modern applications often store JSON. Index it properly:

-- PostgreSQL JSONB indexing
CREATE INDEX idx_orders_json ON orders USING GIN(data);

-- Query specific JSON fields
SELECT * FROM orders WHERE data->>'customer_id' = '123';

-- More efficient: expression index
CREATE INDEX idx_orders_customer_id 
ON orders((data->>'customer_id'));

SELECT * FROM orders WHERE data->>'customer_id' = '123';

Query Optimization Patterns

Analyzing Query Execution

Always examine how your queries use indexes:

-- PostgreSQL: Explain query plan
EXPLAIN ANALYZE 
SELECT * FROM users WHERE email = '[email protected]';

-- MySQL: Show execution plan
EXPLAIN FORMAT=JSON 
SELECT * FROM users WHERE email = '[email protected]';

Look for these indicators:

Seq Scan on table: Table scan, no index used
Index Scan using idx_xxx: Good, index is being used
Index Only Scan: Best, covering index
Bitmap Index Scan + Bitmap Heap: Multiple index usage

Partial Indexes

Create indexes for a subset of rows:

-- Only index active users (likely 90% of queries)
CREATE INDEX idx_users_active_email 
ON users(email) 
WHERE status = 'active';

-- Index only large orders
CREATE INDEX idx_orders_large 
ON orders(order_date DESC) 
WHERE total_amount > 10000;

Functional Indexes

Index the result of functions or expressions:

-- Case-insensitive search
CREATE INDEX idx_users_email_lower 
ON users(LOWER(email));

SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- Computed columns
CREATE INDEX idx_orders_year 
ON orders(EXTRACT(YEAR FROM order_date));

-- Domain-specific: full name search
CREATE INDEX idx_users_name_search 
ON users((first_name || ' ' || last_name));

Advanced Indexing Techniques

Index-Organized Tables

Store the entire table in the index structure:

-- Oracle IOT
CREATE TABLE users (
    user_id PRIMARY KEY,
    email,
    name
) ORGANIZATION INDEX;

-- MySQL InnoDB primary key is always an IOT

Reverse Key Indexes

Useful for sequence-generated keys that insert sequentially:

-- Oracle reverse key index
CREATE INDEX idx_orders_id_rev ON orders(order_id) REVERSE;

Prevents index contention at the rightmost block when inserting sequential IDs.

Composite Index Design Patterns

Pattern 1: Equality + Range

-- Queries: WHERE status = 'active' AND created_at > '2024-01-01'
CREATE INDEX idx_orders_status_date 
ON orders(status, created_at DESC);

Pattern 2: Multiple Equality Conditions

-- Queries: WHERE category = 'electronics' AND status = 'published'
CREATE INDEX idx_products_cat_status 
ON products(category, status);

Pattern 3: Sort-Heavy Queries

-- Queries: WHERE published = true ORDER BY created_at DESC
CREATE INDEX idx_articles_published_date 
ON articles(published, created_at DESC);

Clustered vs Non-Clustered Indexes

-- SQL Server: Clustered index determines physical order
CREATE CLUSTERED INDEX idx_users_email ON users(email);

-- Only one clustered index per table
-- Choose wisely based on your most common query pattern

Common Indexing Mistakes

Mistake 1: Indexing Every Column

-- Don't do this
CREATE INDEX idx_users_all ON users(id, name, email, phone, address, created_at);

-- Instead, create targeted indexes based on actual query patterns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_phone ON users(phone);

Mistake 2: Ignoring Query Patterns

-- If you frequently query:
WHERE LOWER(email) = '[email protected]'

-- Regular index won't help
CREATE INDEX idx_users_email ON users(email);

-- You need a functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Mistake 3: Not Considering Write Overhead

Every index slows down INSERT, UPDATE, and DELETE operations:

-- 5 indexes = 5x write overhead
-- Balance read performance vs write performance

-- Consider: Do you really need all these indexes?
-- Monitor write latency after adding indexes

Mistake 4: Using VARCHAR as Primary Key

-- Bad: String primary key
CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY,  -- Slow inserts, large index
    ...
);

-- Good: Integer surrogate key
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE,  -- Separate unique constraint
    ...
);

Index Maintenance

Monitoring Index Usage

-- PostgreSQL: Find unused indexes
SELECT 
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%pkey%';

Identifying Bloat

-- PostgreSQL: Check index bloat
SELECT 
    schemaname || '.' || tablename AS table_name,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    (100 * idx_scan / (seq_scan + idx_scan + 1))::INT AS hit_ratio
FROM pg_stat_user_indexes
WHERE idx_scan > 0
ORDER BY hit_ratio ASC;

Rebuilding Indexes

-- PostgreSQL: Rebuilder (concurrent, no lock)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE users;

-- SQL Server: Rebuild index
ALTER INDEX idx_users_email ON users REBUILD;

Database-Specific Indexing

PostgreSQL

-- Partial index
CREATE INDEX idx_users_active ON users(email) WHERE status = 'active';

-- JSON path index
CREATE INDEX idx_orders_customers ON orders((data->>'customer_id'));

-- BRIN index for time-series (much smaller)
CREATE INDEX idx_events_time_brin ON events USING BRIN(event_time);

MySQL

-- Prefix indexes for long strings
CREATE INDEX idx_users_name ON users(name(10));

-- Composite index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Invisible index (for testing)
CREATE INDEX idx_users_email ON users(email) INVISIBLE;

MongoDB

// Single field index
db.users.createIndex({ email: 1 });

// Compound index
db.orders.createIndex({ customerId: 1, createdAt: -1 });

// Multikey index (arrays)
db.products.createIndex({ tags: 1 });

// Text index
db.articles.createIndex({ content: "text" });

Performance Benchmarks

Impact of Indexing

Query Type	Without Index	With Index
Point lookup (1M rows)	~500ms	~2ms
Range query	~800ms	~15ms
JOIN with indexed FK	~2000ms	~50ms
ORDER BY	~600ms	~5ms

Index Size Estimation

-- PostgreSQL: Index size
SELECT pg_size_pretty(pg_relation_size('idx_users_email'));

-- MySQL: Index length
SHOW INDEX FROM users;

-- Rule of thumb: index is 10-30% of table size for B-tree

Best Practices Summary

Analyze Query Patterns: Use EXPLAIN to understand actual usage
Start with Primary Keys: Ensure foreign keys are indexed
Composite Index Column Order: Put equality conditions first, range conditions last
Use Covering Indexes: Include all SELECTed columns
Consider Partial Indexes: Index only frequently accessed subsets
Monitor and Maintain: Remove unused indexes regularly
Balance Read/Write: More indexes = faster reads, slower writes
Test in Staging: Benchmark before production deployment

Conclusion

Database indexing is both an art and a science. The right index can reduce query time from seconds to milliseconds, while the wrong index wastes storage and slows down writes.

Start by understanding your actual query patterns, use EXPLAIN to verify index usage, and iterate based on real performance metrics. Remember that indexing is not a one-time task—continuous monitoring and refinement are essential for maintaining optimal database performance as your application evolves.

With these strategies in your toolkit, you’re well-equipped to tackle even the most demanding query optimization challenges.