Skip to main content
โšก Calmops

Database Query Optimization: Indexes, EXPLAIN, and N+1 Queries

Introduction

Slow database queries are the most common cause of poor application performance. A query that takes 30 seconds on an unindexed table can return in milliseconds with the right index. This guide covers the techniques that matter most: indexing, reading query plans, fixing N+1 queries, and efficient pagination.

Prerequisites: Basic SQL knowledge, access to PostgreSQL or MySQL.

Step 1: Find Slow Queries

Before optimizing, identify what’s actually slow:

-- PostgreSQL: enable slow query logging
-- In postgresql.conf:
log_min_duration_statement = 1000  -- log queries > 1 second

-- Or use pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the slowest queries
SELECT
    query,
    calls,
    total_exec_time / calls AS avg_ms,
    rows / calls AS avg_rows,
    total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
// Node.js: log slow queries with Prisma
const prisma = new PrismaClient({
    log: [
        { emit: 'event', level: 'query' },
    ],
});

prisma.$on('query', (e) => {
    if (e.duration > 100) {  // log queries > 100ms
        console.warn(`Slow query (${e.duration}ms):`, e.query);
    }
});

Step 2: Read EXPLAIN ANALYZE

EXPLAIN ANALYZE shows exactly how PostgreSQL executes a query:

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

Sample output:

Limit  (cost=1234.56..1234.58 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1)
  -> Sort  (cost=1234.56..1259.56 rows=10000 width=40) (actual time=45.120..45.121 rows=10 loops=1)
       Sort Key: (count(o.id)) DESC
       -> Hash Left Join  (cost=500.00..900.00 rows=10000 width=40) (actual time=12.345..40.123 rows=10000 loops=1)
             Hash Cond: (o.user_id = u.id)
             -> Seq Scan on orders o  (cost=0.00..300.00 rows=20000 width=8) (actual time=0.012..8.234 rows=20000 loops=1)
             -> Hash  (cost=400.00..400.00 rows=8000 width=36) (actual time=10.123..10.123 rows=8000 loops=1)
                   -> Seq Scan on users u  (cost=0.00..400.00 rows=8000 width=36) (actual time=0.010..8.000 rows=8000 loops=1)
                         Filter: (created_at > '2026-01-01'::date)
                         Rows Removed by Filter: 2000
Planning Time: 0.456 ms
Execution Time: 45.234 ms

What to look for:

Node Type Meaning Good/Bad
Index Scan Uses an index Good
Index Only Scan Answers from index alone Best
Seq Scan Reads entire table Bad for large tables
Bitmap Heap Scan Index + table lookup OK
Hash Join Joins using hash table Usually good
Nested Loop Row-by-row join Bad for large tables

Red flags:

  • Seq Scan on a large table with a filter โ†’ needs an index
  • Rows Removed by Filter: 50000 โ†’ index would help
  • High actual time vs cost โ†’ statistics may be stale (run ANALYZE)

Step 3: Add the Right Indexes

Basic Index

-- Index on a frequently filtered column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Verify it's being used
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
-- Should show: Index Scan using idx_users_email

Composite Index

Column order matters โ€” put the most selective column first, or the one used in equality conditions:

-- For queries like: WHERE user_id = ? AND status = ? ORDER BY created_at DESC
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at DESC);

-- This index supports:
-- WHERE user_id = 42
-- WHERE user_id = 42 AND status = 'pending'
-- WHERE user_id = 42 AND status = 'pending' ORDER BY created_at DESC

-- This index does NOT support:
-- WHERE status = 'pending'  (skips user_id โ€” leftmost prefix rule)

Partial Index

Index only the rows you actually query:

-- Only index active users (much smaller, faster)
CREATE INDEX idx_users_email_active ON users(email)
WHERE deleted_at IS NULL;

-- Only index recent orders
CREATE INDEX idx_orders_recent ON orders(user_id, created_at)
WHERE created_at > NOW() - INTERVAL '90 days';

Covering Index (PostgreSQL INCLUDE)

Include extra columns so the query never touches the table:

-- Query: SELECT id, name, email FROM users WHERE email = ?
-- Without INCLUDE: index lookup โ†’ table fetch for name
-- With INCLUDE: answered entirely from index
CREATE INDEX idx_users_email_cover ON users(email)
INCLUDE (id, name);

-- EXPLAIN should show: Index Only Scan

Find Missing Indexes

-- Tables with many sequential scans (candidates for indexing)
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 100
  AND n_live_tup > 10000
ORDER BY seq_tup_read DESC;

-- Unused indexes (wasting write performance)
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_%'
ORDER BY schemaname, tablename;

Step 4: Fix N+1 Queries

The N+1 problem: fetching N records then making N additional queries for related data.

// BAD: N+1 โ€” 1 query for users + N queries for posts
const users = await User.findAll();  // 1 query
for (const user of users) {
    user.posts = await Post.findAll({ where: { userId: user.id } });  // N queries
}
// Total: 1 + N queries

// GOOD: 2 queries with JOIN
const users = await User.findAll({
    include: [{ model: Post }]  // Sequelize eager loading
});
// Total: 2 queries (users + posts with IN clause)
// Prisma: use include for eager loading
const users = await prisma.user.findMany({
    include: {
        posts: {
            select: { id: true, title: true, createdAt: true }
        }
    }
});

// TypeORM: use relations
const users = await userRepository.find({
    relations: ['posts'],
});
-- Raw SQL: single query with JOIN
SELECT
    u.id,
    u.name,
    u.email,
    COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.name, u.email;

Detect N+1 in Development

// Log all queries and look for patterns
// With Prisma:
const prisma = new PrismaClient({
    log: ['query'],
});

// With Sequelize:
const sequelize = new Sequelize(url, {
    logging: (sql) => console.log(sql),
});

// Look for repeated queries like:
// SELECT * FROM posts WHERE user_id = 1
// SELECT * FROM posts WHERE user_id = 2
// SELECT * FROM posts WHERE user_id = 3
// โ†’ This is N+1

Step 5: Efficient Pagination

Offset Pagination (Simple but Slow at Scale)

-- Page 1
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 0;

-- Page 100 (scans 2000 rows to skip them)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 1980;

Problem: OFFSET 1980 makes PostgreSQL scan and discard 1980 rows. Gets slower as page number increases.

Cursor Pagination (Fast at Any Page)

-- First page
SELECT id, title, created_at
FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page: use last item's values as cursor
SELECT id, title, created_at
FROM posts
WHERE (created_at, id) < ('2026-03-01 10:00:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;
// API implementation
async function getPosts(cursor, limit = 20) {
    const where = cursor
        ? {
            OR: [
                { createdAt: { lt: cursor.createdAt } },
                { createdAt: cursor.createdAt, id: { lt: cursor.id } }
            ]
          }
        : {};

    const posts = await prisma.post.findMany({
        where,
        orderBy: [{ createdAt: 'desc' }, { id: 'desc' }],
        take: limit + 1,  // fetch one extra to check if there's a next page
    });

    const hasNextPage = posts.length > limit;
    const items = hasNextPage ? posts.slice(0, -1) : posts;
    const nextCursor = hasNextPage
        ? { createdAt: items[items.length - 1].createdAt, id: items[items.length - 1].id }
        : null;

    return { items, nextCursor, hasNextPage };
}

Query Optimization Patterns

Select Only What You Need

-- BAD: fetches all columns including large text/blob fields
SELECT * FROM articles WHERE id = 42;

-- GOOD: fetch only needed columns
SELECT id, title, excerpt, author_id, created_at
FROM articles
WHERE id = 42;

Use Transactions for Multi-Step Operations

// BAD: separate queries โ€” inconsistent if one fails
await db.query('UPDATE accounts SET balance = balance - 100 WHERE id = 1');
await db.query('UPDATE accounts SET balance = balance + 100 WHERE id = 2');

// GOOD: atomic transaction
await db.transaction(async (trx) => {
    await trx('accounts').where('id', 1).decrement('balance', 100);
    await trx('accounts').where('id', 2).increment('balance', 100);
});

Batch Inserts

-- BAD: N separate INSERT statements
INSERT INTO tags (name) VALUES ('javascript');
INSERT INTO tags (name) VALUES ('typescript');
INSERT INTO tags (name) VALUES ('react');

-- GOOD: single batch INSERT
INSERT INTO tags (name) VALUES
    ('javascript'),
    ('typescript'),
    ('react');
// Prisma batch insert
await prisma.tag.createMany({
    data: [
        { name: 'javascript' },
        { name: 'typescript' },
        { name: 'react' },
    ],
    skipDuplicates: true,
});

MongoDB Query Optimization

// Create indexes
db.users.createIndex({ email: 1 }, { unique: true });
db.posts.createIndex({ userId: 1, createdAt: -1 });
db.posts.createIndex({ tags: 1 });  // multikey index for arrays

// Explain a query
db.posts.find({ userId: ObjectId("...") }).explain("executionStats");
// Look for: IXSCAN (good) vs COLLSCAN (bad)

// Projection: only fetch needed fields
db.posts.find(
    { userId: ObjectId("...") },
    { title: 1, createdAt: 1, _id: 0 }  // include title, createdAt; exclude _id
);

// Aggregation pipeline optimization
db.orders.aggregate([
    { $match: { status: "pending" } },  // filter early (uses index)
    { $sort: { createdAt: -1 } },       // sort before limit
    { $limit: 20 },                      // limit before expensive operations
    { $lookup: { ... } },               // join after reducing dataset
]);

Resources

Comments