Database Query Optimization in Rust

Database performance is often a bottleneck in production Rust applications. Slow queries compound at scale—what works fine for 100 concurrent users becomes unusable with 10,000. This article covers query optimization, indexing strategies, connection pooling, and profiling techniques specific to Rust.

Why Database Performance Matters

Database queries typically account for 60-80% of total response time in web applications. Optimizing them yields massive returns:

1ms saved per query × 1000 queries/sec = 1 second of latency saved
Reduced server load enables handling more concurrent users
Lower cloud costs through resource efficiency
Better user experience through responsive applications

Understanding Query Performance

Execution Time Breakdown

Total Time = Network Latency + Query Execution + Result Serialization
           ≈ 1-2ms          + 50-500ms        + 1-5ms

The query execution is where optimization matters most.

The Query Execution Plan

Databases use query planners to decide how to execute SQL:

// filepath: src/query_planning.rs
use sqlx::postgres::PgPool;

pub async fn show_query_plan(pool: &PgPool) -> Result<(), sqlx::Error> {
    // EXPLAIN shows how the database executes the query
    let plan = sqlx::query_scalar::<_, String>(
        "EXPLAIN (FORMAT JSON) SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;

    println!("Query Plan: {}", plan);
    Ok(())
}

// With ANALYZE to show actual execution stats
pub async fn show_actual_plan(pool: &PgPool) -> Result<(), sqlx::Error> {
    let plan = sqlx::query_scalar::<_, String>(
        "EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;

    println!("Actual Query Plan: {}", plan);
    Ok(())
}

Indexing Strategies

1. Single Column Index

The most common optimization for WHERE clauses:

// filepath: src/indexing.rs

// In your migration:
// CREATE INDEX idx_users_email ON users(email);

pub async fn find_user_by_email(pool: &PgPool, email: &str) 
    -> Result<User, sqlx::Error> 
{
    sqlx::query_as::<_, User>(
        "SELECT * FROM users WHERE email = $1"
    )
    .bind(email)
    .fetch_one(pool)
    .await
}

// Without index: O(n) - scans all rows
// With index: O(log n) - uses B-tree lookup

2. Composite Index for Multi-Column Queries

// In migration:
// CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

pub async fn get_user_orders(
    pool: &PgPool,
    user_id: i32,
    limit: i64,
) -> Result<Vec<Order>, sqlx::Error> {
    sqlx::query_as::<_, Order>(
        "SELECT * FROM orders 
         WHERE user_id = $1 
         ORDER BY created_at DESC 
         LIMIT $2"
    )
    .bind(user_id)
    .bind(limit)
    .fetch_all(pool)
    .await
}

Key insight: Index columns should match query order. (user_id, created_at) helps queries filtering by user_id first, then ordering by created_at.

3. Partial Indexes for Filtered Data

// In migration:
// CREATE INDEX idx_active_subscriptions ON subscriptions(user_id) 
//   WHERE status = 'active';

pub async fn get_active_subscriptions(pool: &PgPool, user_id: i32)
    -> Result<Vec<Subscription>, sqlx::Error>
{
    sqlx::query_as::<_, Subscription>(
        "SELECT * FROM subscriptions 
         WHERE user_id = $1 AND status = 'active'"
    )
    .bind(user_id)
    .fetch_all(pool)
    .await
}

// Partial index only stores active rows - smaller and faster

4. Full-Text Search Index

// In migration:
// CREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));

pub async fn search_articles(pool: &PgPool, query: &str)
    -> Result<Vec<Article>, sqlx::Error>
{
    sqlx::query_as::<_, Article>(
        "SELECT * FROM articles 
         WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
         ORDER BY ts_rank(to_tsvector('english', content), 
                  plainto_tsquery('english', $1)) DESC"
    )
    .bind(query)
    .fetch_all(pool)
    .await
}

Query Optimization Techniques

1. Select Only Needed Columns

// ❌ Bad: Selects all columns including large BLOB
pub async fn get_user_summary(pool: &PgPool, id: i32)
    -> Result<User, sqlx::Error>
{
    sqlx::query_as::<_, User>("SELECT * FROM users WHERE id = $1")
        .bind(id)
        .fetch_one(pool)
        .await
}

// ✅ Good: Select only needed columns
#[derive(sqlx::FromRow)]
pub struct UserSummary {
    pub id: i32,
    pub name: String,
    pub email: String,
}

pub async fn get_user_summary(pool: &PgPool, id: i32)
    -> Result<UserSummary, sqlx::Error>
{
    sqlx::query_as::<_, UserSummary>(
        "SELECT id, name, email FROM users WHERE id = $1"
    )
    .bind(id)
    .fetch_one(pool)
    .await
}

2. Pagination Instead of LIMIT

// ❌ Bad: Inefficient with large offsets
pub async fn get_users_page_bad(pool: &PgPool, page: i32)
    -> Result<Vec<User>, sqlx::Error>
{
    let offset = (page - 1) * 20;
    sqlx::query_as::<_, User>(
        "SELECT * FROM users LIMIT 20 OFFSET $1"
    )
    .bind(offset)
    .fetch_all(pool)
    .await
}

// ✅ Good: Seek-based pagination (keyset pagination)
pub async fn get_users_page_good(
    pool: &PgPool,
    last_id: Option<i32>,
    limit: i32,
) -> Result<Vec<User>, sqlx::Error> {
    let query = if let Some(id) = last_id {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id > $1 ORDER BY id LIMIT $2"
        )
        .bind(id)
    } else {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users ORDER BY id LIMIT $1"
        )
    };
    
    query.bind(limit).fetch_all(pool).await
}

3. Batch Operations Instead of N+1

// ❌ Bad: N+1 query problem
pub async fn get_users_with_posts_bad(pool: &PgPool, user_ids: Vec<i32>)
    -> Result<Vec<(User, Vec<Post>)>, sqlx::Error>
{
    let mut results = Vec::new();
    
    for user_id in user_ids {
        let user = sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = $1"
        )
        .bind(user_id)
        .fetch_one(pool)
        .await?;
        
        let posts = sqlx::query_as::<_, Post>(
            "SELECT * FROM posts WHERE user_id = $1"
        )
        .bind(user_id)
        .fetch_all(pool)
        .await?;
        
        results.push((user, posts));
    }
    
    Ok(results)
}

// ✅ Good: Single query with JOIN
#[derive(sqlx::FromRow)]
pub struct UserWithPosts {
    pub user_id: i32,
    pub user_name: String,
    pub post_id: i32,
    pub post_title: String,
}

pub async fn get_users_with_posts(pool: &PgPool, user_ids: Vec<i32>)
    -> Result<Vec<(User, Vec<Post>)>, sqlx::Error>
{
    let results = sqlx::query_as::<_, UserWithPosts>(
        "SELECT u.id as user_id, u.name as user_name, 
                p.id as post_id, p.title as post_title
         FROM users u
         LEFT JOIN posts p ON u.id = p.user_id
         WHERE u.id = ANY($1)"
    )
    .bind(&user_ids)
    .fetch_all(pool)
    .await?;
    
    // Transform results into (User, Vec<Post>) format
    // Implementation left for brevity
    Ok(Vec::new())
}

4. Connection Pooling

// filepath: src/pool.rs
use sqlx::postgres::PgPoolOptions;
use std::time::Duration;

pub async fn create_pool(database_url: &str) -> Result<sqlx::PgPool, sqlx::Error> {
    PgPoolOptions::new()
        // Max connections to maintain
        .max_connections(20)
        // Min connections to keep warm
        .min_connections(5)
        // Connection timeout
        .connect_timeout(Duration::from_secs(5))
        // Idle timeout before closing connection
        .idle_timeout(Some(Duration::from_secs(600)))
        // Max lifetime of a connection
        .max_lifetime(Some(Duration::from_secs(3600)))
        // Acquire timeout - how long to wait for a connection
        .acquire_timeout(Duration::from_secs(10))
        .connect(database_url)
        .await
}

#[tokio::main]
async fn main() {
    let pool = create_pool("postgresql://user:pass@localhost/db")
        .await
        .expect("Failed to create pool");
    
    // Pool handles connection reuse automatically
    for _ in 0..100 {
        let user = sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = 1"
        )
        .fetch_one(&pool)
        .await
        .expect("Failed to fetch user");
    }
    
    // Connections returned to pool automatically
}

Query Profiling and Monitoring

Measuring Query Performance

// filepath: src/profiling.rs
use std::time::Instant;
use tracing::{info, warn};

pub async fn profile_query<T, F>(name: &str, query: F) -> T
where
    F: std::future::Future<Output = T>,
{
    let start = Instant::now();
    let result = query.await;
    let duration = start.elapsed();

    if duration.as_millis() > 100 {
        warn!(
            query = name,
            duration_ms = duration.as_millis(),
            "Slow query detected"
        );
    } else {
        info!(
            query = name,
            duration_ms = duration.as_millis(),
            "Query completed"
        );
    }

    result
}

// Usage
pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    profile_query("get_user", async {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = $1"
        )
        .bind(id)
        .fetch_one(pool)
        .await
    }).await
}

PostgreSQL Query Statistics

// View most common slow queries
pub async fn get_slow_queries(pool: &PgPool) -> Result<Vec<String>, sqlx::Error> {
    sqlx::query_scalar::<_, String>(
        "SELECT query FROM pg_stat_statements 
         WHERE mean_exec_time > 100 
         ORDER BY total_exec_time DESC 
         LIMIT 10"
    )
    .fetch_all(pool)
    .await
}

// Table statistics
pub async fn get_table_stats(pool: &PgPool, table: &str) 
    -> Result<TableStats, sqlx::Error> 
{
    sqlx::query_as::<_, TableStats>(
        "SELECT schemaname, tablename, seq_scan, seq_tup_read, 
                idx_scan, idx_tup_fetch
         FROM pg_stat_user_tables 
         WHERE tablename = $1"
    )
    .bind(table)
    .fetch_one(pool)
    .await
}

Common Pitfalls and Best Practices

❌ Pitfall 1: Not Using Query Compilation

// ❌ Bad: Query recompiled every time
pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    sqlx::query_as::<_, User>(
        "SELECT * FROM users WHERE id = $1"  // String, not compiled
    )
    .bind(id)
    .fetch_one(pool)
    .await
}

// ✅ Good: Use sqlx::query! macro for compile-time checking
// Requires: # sqlx database setup
#[derive(sqlx::FromRow)]
pub struct User {
    pub id: i32,
    pub name: String,
}

pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    sqlx::query_as!(User, "SELECT id, name FROM users WHERE id = $1", id)
        .fetch_one(pool)
        .await
}

❌ Pitfall 2: Ignoring Index Usage

// Check if your index is being used
pub async fn verify_index(pool: &PgPool) -> Result<(), sqlx::Error> {
    let result = sqlx::query_scalar::<_, bool>(
        "EXPLAIN (FORMAT JSON) 
         SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;
    
    // Parse JSON to check if "Index Scan" or "Seq Scan" is used
    println!("Using index: {}", result);
    Ok(())
}

✅ Best Practice 1: Prepare Statements

// Always use parameterized queries to prevent SQL injection
// and enable query plan caching

pub async fn search_users(pool: &PgPool, search_term: &str)
    -> Result<Vec<User>, sqlx::Error>
{
    sqlx::query_as::<_, User>(
        "SELECT * FROM users 
         WHERE name ILIKE $1 OR email ILIKE $1"  // Parameterized
    )
    .bind(format!("%{}%", search_term))  // Parameters are separate
    .fetch_all(pool)
    .await
}

✅ Best Practice 2: Monitor Pool Exhaustion

use tokio::time::{interval, Duration};

pub async fn monitor_pool(pool: &sqlx::PgPool) {
    let mut ticker = interval(Duration::from_secs(60));
    
    loop {
        ticker.tick().await;
        
        println!(
            "Pool stats - connections: {}, idle: {}",
            pool.size(),
            pool.idle_connections()
        );
        
        if pool.size() == pool.num_idle().unwrap_or(0) {
            eprintln!("Warning: All pool connections are idle");
        }
    }
}

✅ Best Practice 3: Use Transactions for Consistency

pub async fn transfer_funds(
    pool: &sqlx::PgPool,
    from_id: i32,
    to_id: i32,
    amount: f64,
) -> Result<(), sqlx::Error> {
    let mut tx = pool.begin().await?;
    
    sqlx::query("UPDATE accounts SET balance = balance - $1 WHERE id = $2")
        .bind(amount)
        .bind(from_id)
        .execute(&mut *tx)
        .await?;
    
    sqlx::query("UPDATE accounts SET balance = balance + $1 WHERE id = $2")
        .bind(amount)
        .bind(to_id)
        .execute(&mut *tx)
        .await?;
    
    tx.commit().await?;
    Ok(())
}

Performance Comparison

Technique	Before	After	Improvement
Add index	500ms (full scan)	10ms (index lookup)	50x faster
N+1 → JOIN	1s (100 queries)	5ms (1 query)	200x faster
Connection pool	New connection per query	Reused connections	10-50x faster
Seek pagination	800ms (offset 10000)	5ms	160x faster
Select needed columns	50MB transferred	1MB transferred	50x faster network

Further Resources

Documentation

PostgreSQL Query Performance: https://www.postgresql.org/docs/current/performance.html
SQLx Documentation: https://github.com/launchbadge/sqlx
EXPLAIN in PostgreSQL: https://www.postgresql.org/docs/current/sql-explain.html

Books and Articles

“Use The Index, Luke”: Free book on SQL performance https://use-the-index-luke.com/
“PostgreSQL Query Performance Insights”: https://www.depesz.com/2018/11/10/explaining-the-unexplainable/
Query Optimization Techniques: https://wiki.postgresql.org/wiki/Performance_Optimization

Helpful Crates and Tools

sqlx: SQL toolkit https://crates.io/crates/sqlx
diesel: ORM with query builder https://crates.io/crates/diesel
pgbench: PostgreSQL benchmarking tool
pgAdmin: PostgreSQL GUI for analysis

Monitoring Tools

pg_stat_statements: PostgreSQL query statistics
EXPLAIN: Query execution plans
pgBadger: PostgreSQL log analyzer
Grafana: Dashboard for database metrics

Query Optimization Checklist

Index frequently queried columns
Use composite indexes for multi-column queries
Analyze slow query logs with EXPLAIN
Avoid SELECT *; select needed columns only
Use pagination for large result sets
Fix N+1 query problems with JOINs
Configure connection pool appropriately
Monitor pool utilization
Use transactions for multi-step operations
Implement query profiling and logging
Monitor query execution plans
Test with realistic data volumes

Conclusion

Database optimization in Rust involves:

Strategic indexing - Dramatically improve query speed
Query design - Use JOINs, avoid N+1, select needed columns
Connection pooling - Reduce connection overhead
Profiling and monitoring - Identify bottlenecks
Transactions - Ensure consistency and safety

By applying these techniques, you can build Rust applications that handle massive database workloads efficiently.

Database fundamentals
Error Handling in Rust - Handle database errors
Concurrency in Rust: Sharing Data - Thread-safe database access
Building REST APIs with Axum - APIs backed by databases

Query fast! ⚡