Skip to main content
โšก Calmops

Database Query Optimization in Rust

Master query performance, indexing, connection pooling, and profiling techniques

Database performance is often a bottleneck in production Rust applications. Slow queries compound at scaleโ€”what works fine for 100 concurrent users becomes unusable with 10,000. This article covers query optimization, indexing strategies, connection pooling, and profiling techniques specific to Rust.


Why Database Performance Matters

Database queries typically account for 60-80% of total response time in web applications. Optimizing them yields massive returns:

  • 1ms saved per query ร— 1000 queries/sec = 1 second of latency saved
  • Reduced server load enables handling more concurrent users
  • Lower cloud costs through resource efficiency
  • Better user experience through responsive applications

Understanding Query Performance

Execution Time Breakdown

Total Time = Network Latency + Query Execution + Result Serialization
           โ‰ˆ 1-2ms          + 50-500ms        + 1-5ms

The query execution is where optimization matters most.

The Query Execution Plan

Databases use query planners to decide how to execute SQL:

// filepath: src/query_planning.rs
use sqlx::postgres::PgPool;

pub async fn show_query_plan(pool: &PgPool) -> Result<(), sqlx::Error> {
    // EXPLAIN shows how the database executes the query
    let plan = sqlx::query_scalar::<_, String>(
        "EXPLAIN (FORMAT JSON) SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;

    println!("Query Plan: {}", plan);
    Ok(())
}

// With ANALYZE to show actual execution stats
pub async fn show_actual_plan(pool: &PgPool) -> Result<(), sqlx::Error> {
    let plan = sqlx::query_scalar::<_, String>(
        "EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;

    println!("Actual Query Plan: {}", plan);
    Ok(())
}

Indexing Strategies

1. Single Column Index

The most common optimization for WHERE clauses:

// filepath: src/indexing.rs

// In your migration:
// CREATE INDEX idx_users_email ON users(email);

pub async fn find_user_by_email(pool: &PgPool, email: &str) 
    -> Result<User, sqlx::Error> 
{
    sqlx::query_as::<_, User>(
        "SELECT * FROM users WHERE email = $1"
    )
    .bind(email)
    .fetch_one(pool)
    .await
}

// Without index: O(n) - scans all rows
// With index: O(log n) - uses B-tree lookup

2. Composite Index for Multi-Column Queries

// In migration:
// CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

pub async fn get_user_orders(
    pool: &PgPool,
    user_id: i32,
    limit: i64,
) -> Result<Vec<Order>, sqlx::Error> {
    sqlx::query_as::<_, Order>(
        "SELECT * FROM orders 
         WHERE user_id = $1 
         ORDER BY created_at DESC 
         LIMIT $2"
    )
    .bind(user_id)
    .bind(limit)
    .fetch_all(pool)
    .await
}

Key insight: Index columns should match query order. (user_id, created_at) helps queries filtering by user_id first, then ordering by created_at.

3. Partial Indexes for Filtered Data

// In migration:
// CREATE INDEX idx_active_subscriptions ON subscriptions(user_id) 
//   WHERE status = 'active';

pub async fn get_active_subscriptions(pool: &PgPool, user_id: i32)
    -> Result<Vec<Subscription>, sqlx::Error>
{
    sqlx::query_as::<_, Subscription>(
        "SELECT * FROM subscriptions 
         WHERE user_id = $1 AND status = 'active'"
    )
    .bind(user_id)
    .fetch_all(pool)
    .await
}

// Partial index only stores active rows - smaller and faster

4. Full-Text Search Index

// In migration:
// CREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));

pub async fn search_articles(pool: &PgPool, query: &str)
    -> Result<Vec<Article>, sqlx::Error>
{
    sqlx::query_as::<_, Article>(
        "SELECT * FROM articles 
         WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
         ORDER BY ts_rank(to_tsvector('english', content), 
                  plainto_tsquery('english', $1)) DESC"
    )
    .bind(query)
    .fetch_all(pool)
    .await
}

Query Optimization Techniques

1. Select Only Needed Columns

// โŒ Bad: Selects all columns including large BLOB
pub async fn get_user_summary(pool: &PgPool, id: i32)
    -> Result<User, sqlx::Error>
{
    sqlx::query_as::<_, User>("SELECT * FROM users WHERE id = $1")
        .bind(id)
        .fetch_one(pool)
        .await
}

// โœ… Good: Select only needed columns
#[derive(sqlx::FromRow)]
pub struct UserSummary {
    pub id: i32,
    pub name: String,
    pub email: String,
}

pub async fn get_user_summary(pool: &PgPool, id: i32)
    -> Result<UserSummary, sqlx::Error>
{
    sqlx::query_as::<_, UserSummary>(
        "SELECT id, name, email FROM users WHERE id = $1"
    )
    .bind(id)
    .fetch_one(pool)
    .await
}

2. Pagination Instead of LIMIT

// โŒ Bad: Inefficient with large offsets
pub async fn get_users_page_bad(pool: &PgPool, page: i32)
    -> Result<Vec<User>, sqlx::Error>
{
    let offset = (page - 1) * 20;
    sqlx::query_as::<_, User>(
        "SELECT * FROM users LIMIT 20 OFFSET $1"
    )
    .bind(offset)
    .fetch_all(pool)
    .await
}

// โœ… Good: Seek-based pagination (keyset pagination)
pub async fn get_users_page_good(
    pool: &PgPool,
    last_id: Option<i32>,
    limit: i32,
) -> Result<Vec<User>, sqlx::Error> {
    let query = if let Some(id) = last_id {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id > $1 ORDER BY id LIMIT $2"
        )
        .bind(id)
    } else {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users ORDER BY id LIMIT $1"
        )
    };
    
    query.bind(limit).fetch_all(pool).await
}

3. Batch Operations Instead of N+1

// โŒ Bad: N+1 query problem
pub async fn get_users_with_posts_bad(pool: &PgPool, user_ids: Vec<i32>)
    -> Result<Vec<(User, Vec<Post>)>, sqlx::Error>
{
    let mut results = Vec::new();
    
    for user_id in user_ids {
        let user = sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = $1"
        )
        .bind(user_id)
        .fetch_one(pool)
        .await?;
        
        let posts = sqlx::query_as::<_, Post>(
            "SELECT * FROM posts WHERE user_id = $1"
        )
        .bind(user_id)
        .fetch_all(pool)
        .await?;
        
        results.push((user, posts));
    }
    
    Ok(results)
}

// โœ… Good: Single query with JOIN
#[derive(sqlx::FromRow)]
pub struct UserWithPosts {
    pub user_id: i32,
    pub user_name: String,
    pub post_id: i32,
    pub post_title: String,
}

pub async fn get_users_with_posts(pool: &PgPool, user_ids: Vec<i32>)
    -> Result<Vec<(User, Vec<Post>)>, sqlx::Error>
{
    let results = sqlx::query_as::<_, UserWithPosts>(
        "SELECT u.id as user_id, u.name as user_name, 
                p.id as post_id, p.title as post_title
         FROM users u
         LEFT JOIN posts p ON u.id = p.user_id
         WHERE u.id = ANY($1)"
    )
    .bind(&user_ids)
    .fetch_all(pool)
    .await?;
    
    // Transform results into (User, Vec<Post>) format
    // Implementation left for brevity
    Ok(Vec::new())
}

4. Connection Pooling

// filepath: src/pool.rs
use sqlx::postgres::PgPoolOptions;
use std::time::Duration;

pub async fn create_pool(database_url: &str) -> Result<sqlx::PgPool, sqlx::Error> {
    PgPoolOptions::new()
        // Max connections to maintain
        .max_connections(20)
        // Min connections to keep warm
        .min_connections(5)
        // Connection timeout
        .connect_timeout(Duration::from_secs(5))
        // Idle timeout before closing connection
        .idle_timeout(Some(Duration::from_secs(600)))
        // Max lifetime of a connection
        .max_lifetime(Some(Duration::from_secs(3600)))
        // Acquire timeout - how long to wait for a connection
        .acquire_timeout(Duration::from_secs(10))
        .connect(database_url)
        .await
}

#[tokio::main]
async fn main() {
    let pool = create_pool("postgresql://user:pass@localhost/db")
        .await
        .expect("Failed to create pool");
    
    // Pool handles connection reuse automatically
    for _ in 0..100 {
        let user = sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = 1"
        )
        .fetch_one(&pool)
        .await
        .expect("Failed to fetch user");
    }
    
    // Connections returned to pool automatically
}

Query Profiling and Monitoring

Measuring Query Performance

// filepath: src/profiling.rs
use std::time::Instant;
use tracing::{info, warn};

pub async fn profile_query<T, F>(name: &str, query: F) -> T
where
    F: std::future::Future<Output = T>,
{
    let start = Instant::now();
    let result = query.await;
    let duration = start.elapsed();

    if duration.as_millis() > 100 {
        warn!(
            query = name,
            duration_ms = duration.as_millis(),
            "Slow query detected"
        );
    } else {
        info!(
            query = name,
            duration_ms = duration.as_millis(),
            "Query completed"
        );
    }

    result
}

// Usage
pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    profile_query("get_user", async {
        sqlx::query_as::<_, User>(
            "SELECT * FROM users WHERE id = $1"
        )
        .bind(id)
        .fetch_one(pool)
        .await
    }).await
}

PostgreSQL Query Statistics

// View most common slow queries
pub async fn get_slow_queries(pool: &PgPool) -> Result<Vec<String>, sqlx::Error> {
    sqlx::query_scalar::<_, String>(
        "SELECT query FROM pg_stat_statements 
         WHERE mean_exec_time > 100 
         ORDER BY total_exec_time DESC 
         LIMIT 10"
    )
    .fetch_all(pool)
    .await
}

// Table statistics
pub async fn get_table_stats(pool: &PgPool, table: &str) 
    -> Result<TableStats, sqlx::Error> 
{
    sqlx::query_as::<_, TableStats>(
        "SELECT schemaname, tablename, seq_scan, seq_tup_read, 
                idx_scan, idx_tup_fetch
         FROM pg_stat_user_tables 
         WHERE tablename = $1"
    )
    .bind(table)
    .fetch_one(pool)
    .await
}

Common Pitfalls and Best Practices

โŒ Pitfall 1: Not Using Query Compilation

// โŒ Bad: Query recompiled every time
pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    sqlx::query_as::<_, User>(
        "SELECT * FROM users WHERE id = $1"  // String, not compiled
    )
    .bind(id)
    .fetch_one(pool)
    .await
}

// โœ… Good: Use sqlx::query! macro for compile-time checking
// Requires: # sqlx database setup
#[derive(sqlx::FromRow)]
pub struct User {
    pub id: i32,
    pub name: String,
}

pub async fn get_user(pool: &PgPool, id: i32) -> Result<User, sqlx::Error> {
    sqlx::query_as!(User, "SELECT id, name FROM users WHERE id = $1", id)
        .fetch_one(pool)
        .await
}

โŒ Pitfall 2: Ignoring Index Usage

// Check if your index is being used
pub async fn verify_index(pool: &PgPool) -> Result<(), sqlx::Error> {
    let result = sqlx::query_scalar::<_, bool>(
        "EXPLAIN (FORMAT JSON) 
         SELECT * FROM users WHERE email = $1"
    )
    .bind("[email protected]")
    .fetch_one(pool)
    .await?;
    
    // Parse JSON to check if "Index Scan" or "Seq Scan" is used
    println!("Using index: {}", result);
    Ok(())
}

โœ… Best Practice 1: Prepare Statements

// Always use parameterized queries to prevent SQL injection
// and enable query plan caching

pub async fn search_users(pool: &PgPool, search_term: &str)
    -> Result<Vec<User>, sqlx::Error>
{
    sqlx::query_as::<_, User>(
        "SELECT * FROM users 
         WHERE name ILIKE $1 OR email ILIKE $1"  // Parameterized
    )
    .bind(format!("%{}%", search_term))  // Parameters are separate
    .fetch_all(pool)
    .await
}

โœ… Best Practice 2: Monitor Pool Exhaustion

use tokio::time::{interval, Duration};

pub async fn monitor_pool(pool: &sqlx::PgPool) {
    let mut ticker = interval(Duration::from_secs(60));
    
    loop {
        ticker.tick().await;
        
        println!(
            "Pool stats - connections: {}, idle: {}",
            pool.size(),
            pool.idle_connections()
        );
        
        if pool.size() == pool.num_idle().unwrap_or(0) {
            eprintln!("Warning: All pool connections are idle");
        }
    }
}

โœ… Best Practice 3: Use Transactions for Consistency

pub async fn transfer_funds(
    pool: &sqlx::PgPool,
    from_id: i32,
    to_id: i32,
    amount: f64,
) -> Result<(), sqlx::Error> {
    let mut tx = pool.begin().await?;
    
    sqlx::query("UPDATE accounts SET balance = balance - $1 WHERE id = $2")
        .bind(amount)
        .bind(from_id)
        .execute(&mut *tx)
        .await?;
    
    sqlx::query("UPDATE accounts SET balance = balance + $1 WHERE id = $2")
        .bind(amount)
        .bind(to_id)
        .execute(&mut *tx)
        .await?;
    
    tx.commit().await?;
    Ok(())
}

Performance Comparison

Technique Before After Improvement
Add index 500ms (full scan) 10ms (index lookup) 50x faster
N+1 โ†’ JOIN 1s (100 queries) 5ms (1 query) 200x faster
Connection pool New connection per query Reused connections 10-50x faster
Seek pagination 800ms (offset 10000) 5ms 160x faster
Select needed columns 50MB transferred 1MB transferred 50x faster network

Further Resources

Documentation

Books and Articles

Helpful Crates and Tools

Monitoring Tools

  • pg_stat_statements: PostgreSQL query statistics
  • EXPLAIN: Query execution plans
  • pgBadger: PostgreSQL log analyzer
  • Grafana: Dashboard for database metrics

Query Optimization Checklist

  • Index frequently queried columns
  • Use composite indexes for multi-column queries
  • Analyze slow query logs with EXPLAIN
  • Avoid SELECT *; select needed columns only
  • Use pagination for large result sets
  • Fix N+1 query problems with JOINs
  • Configure connection pool appropriately
  • Monitor pool utilization
  • Use transactions for multi-step operations
  • Implement query profiling and logging
  • Monitor query execution plans
  • Test with realistic data volumes

Conclusion

Database optimization in Rust involves:

  1. Strategic indexing - Dramatically improve query speed
  2. Query design - Use JOINs, avoid N+1, select needed columns
  3. Connection pooling - Reduce connection overhead
  4. Profiling and monitoring - Identify bottlenecks
  5. Transactions - Ensure consistency and safety

By applying these techniques, you can build Rust applications that handle massive database workloads efficiently.



Query fast! โšก

Comments