Skip to main content

Build Optimization and Binary Size Reduction in Rust

Created: February 17, 2026 CalmOps 6 min read

TL;DR: This guide covers optimizing Rust binaries for production deployments. You’ll learn release profile configurations, LTO, link-time optimization, binary size reduction techniques, and special considerations for Lambda/serverless and embedded targets.


Introduction

Rust is known for generating optimized binaries, but default release builds aren’t always optimal. This article explores:

  • Release profile optimization techniques
  • Link-time optimization (LTO)
  • Binary size reduction for constrained environments
  • Lambda/serverless optimization
  • Cross-compilation for various targets

Cargo Release Profiles

Default Release Profile

[profile.release]
opt-level = 3
lto = false
codegen-units = 16
strip = false

Optimized Profile

[profile.release]
opt-level = 3          # Maximum optimization
lto = "fat"           # Full link-time optimization
codegen-units = 1     # Better optimization, slower compile
strip = true          # Strip symbols
panic = "abort"       # Smaller binary, no unwinding

Comparison

Setting Default Optimized Impact
opt-level 3 3 Same optimization
lto false “fat” 10-20% faster, longer compile
codegen-units 16 1 Better optimization
strip false true Smaller binary
panic unwind “abort” 5-10% smaller

LTO allows the compiler to optimize across crate boundaries.

Thin LTO (Faster Builds, Good Optimization)

[profile.release]
opt-level = 3
lto = "thin"
codegen-units = 16

Fat LTO (Best Optimization, Slower Builds)

[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1

Runtime Performance Comparison

// benchmark.rs
use std::time::Instant;

fn main() {
    let iterations = 10_000_000;
    
    // Test function
    let start = Instant::now();
    for _ in 0..iterations {
        compute(42, 100);
    }
    let duration = start.elapsed();
    
    println!("Total time: {:?}", duration);
    println!("Per iteration: {:?}", duration / iterations);
}

fn compute(a: i32, b: i32) -> i32 {
    (a * b + a - b) / (b + 1)
}

Typical results:

  • Without LTO: ~120ms
  • Thin LTO: ~95ms (21% faster)
  • Fat LTO: ~88ms (27% faster)

Binary Size Optimization

Size-Z Profile (Maximum Size Reduction)

[profile.release]
opt-level = "z"       # Optimize for size
lto = "fat"
codegen-units = 1
strip = true
panic = "abort"

Size Optimization Options

opt-level Effect
0 No optimization
1 Basic optimization
2 More optimization
3 Maximum speed
“z” Maximum size reduction

Profile-Guided Optimization

// Enable PGO in build.rs
fn main() {
    println!("cargo:rustc-cfg=pgo_gen");
}
[profile.release]
pgo = "generate"

Reducing Dependency Size

Use Smaller Crate Alternatives

# Instead of tokio (full), use specific features
tokio = { version = "1", default-features = false, features = ["rt-multi-thread", "macros"] }

# Instead of serde (all formats), use what you need
serde = { version = "1", default-features = false, features = ["derive"] }
serde_json = "1"

# For logging, consider log + minimal implementation
log = "0.4"
env_logger = "0.10"  # Or use tracing with minimal features

Feature Minimization

# Default: pulls in many dependencies
axum = "0.7"

# Minimal: only what's needed
axum = { version = "0.7", default-features = false, features = ["macros", "tower-log"] }

Bill of Materials Comparison

# Show dependency tree
cargo tree --duplicates

# Show size of dependencies
cargo tree --size -t

# Count lines of code
cargo tree --duplicates | wc -l

Lambda and Serverless Optimization

Lambda-Specific Profile

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
panic = "abort"

# Additional Lambda optimizations
[lib]
crate-type = ["cdylib", "rlib"]

Lambda Handler

use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use serde::{Deserialize, Serialize};

#[derive(Deserialize)]
struct Request {
    name: String,
}

#[derive(Serialize)]
struct Response {
    message: String,
}

async fn function_handler(
    event: LambdaEvent<Request>,
) -> Result<Response, Error> {
    let name = event.payload.name;
    
    Ok(Response {
        message: format!("Hello, {}!", name),
    })
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    run(service_fn(function_handler)).await
}

Docker Multi-Stage Build for Lambda

# Build stage
FROM rust:1.75 AS builder

WORKDIR /app

COPY Cargo.toml Cargo.lock ./
COPY src ./src

RUN cargo build --release --lib

# Runtime stage
FROM scratch

COPY --from=builder /app/target/release/libmy_lambda.so /var/task/libmy_lambda.so
COPY --from=builder /app/bootstrap /var/task/bootstrap

ENTRYPOINT ["/var/task/bootstrap"]

Binary Size Comparison

Build Type Binary Size
Debug ~25 MB
Release (default) ~4 MB
Release + LTO ~2.5 MB
Release + size opt (“z”) ~1.2 MB
Lambda optimized ~800 KB

Cross-Compilation

ARM64 (Apple Silicon, Raspberry Pi)

# Install target
rustup target add aarch64-unknown-linux-gnu

# Build
cargo build --release --target aarch64-unknown-linux-gnu

WASM (WebAssembly)

# Install WASM target
rustup target add wasm32-unknown-unknown

# Build
cargo build --release --target wasm32-unknown-unknown

# Convert to JavaScript
wasm-bindgen --out-dir ./out/ --target web ./target/wasm32-unknown-unknown/release/myapp.wasm

ARMv7 (Raspberry Pi 3)

rustup target add armv7-unknown-linux-gnueabihf

# Requires cross-compilation toolchain
cargo build --release --target armv7-unknown-linux-gnueabihf

Stripping and Debug Symbols

Strip Symbols

[profile.release]
strip = true

Manual Strip

# Strip all symbols
strip target/release/myapp

# Strip debug symbols only (keep symbols for debugging)
strip --strip-debug target/release/myapp

# Remove symbol table entirely
strip --strip-all target/release/myapp

Analyze Binary Contents

# Show sections
rust-size target/release/myapp

# Show dependencies
ldd target/release/myapp

# Show symbols
nm target/release/myapp | head -20

Incremental Compilation and Cache

Release Build Cache

# Use sccache for faster builds
cargo install sccache

# Set environment
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED=true

# Build
cargo build --release

Build Script Optimization

// build.rs
fn main() {
    // Only rebuild if build.rs changes
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/config.rs");
    
    // Don't generate code at build time if not needed
    // Use compile-time macros instead
}

WebAssembly Optimization

WASM Size Optimization

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1

[package.metadata.wasm]
profile = "release"

WASM Binaryen Optimization

# Install wasm-opt
cargo install wasm-bindgen-cli

# Run binaryen optimization
wasm-opt -Oz -o output.wasm input.wasm

Example WASM Crate

use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => {
            let mut a = 0u32;
            let mut b = 1u32;
            for _ in 2..=n {
                let temp = a + b;
                a = b;
                b = temp;
            }
            b
        }
    }
}

Embedded Systems Optimization

No Standard Library

#![no_std]
#![no_main]

use core::panic::PanicInfo;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
pub extern "C" fn main() {
    // Embedded application
}

Embedded Profile

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true

[target.thumbv7em-none-eabihf]
rustflags = [
    "-C", "link-arg=-Tlink.x",
    "-C", "target-cpu=cortex-m4",
]

Production Build Script

#!/bin/bash
set -e

echo "Building for production..."

# Clean previous builds
cargo clean

# Build with optimizations
cargo build --release \
    --lib \
    --bin myapp \
    --locked

# Strip symbols
strip target/release/myapp

# Show final size
echo "Final binary size:"
ls -lh target/release/myapp

# Show section sizes
rust-size target/release/myapp

Performance Benchmark

use std::time::Instant;

fn main() {
    // Fibonacci benchmark
    let iterations = 1000;
    
    // Release build comparison
    for profile in ["debug", "release"] {
        let start = Instant::now();
        for _ in 0..iterations {
            let _ = fibonacci(30);
        }
        println!("{}: {:?}", profile, start.elapsed());
    }
}

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

Typical output:

  • debug: ~800ms
  • release (default): ~15ms
  • release + LTO: ~10ms

Conclusion

Optimizing Rust binaries involves:

  1. Release Profiles - Configure opt-level, LTO, codegen-units
  2. Size Optimization - Use opt-level = “z” for smallest binaries
  3. Dependency Management - Minimize features, use smaller crates
  4. Lambda/Serverless - Profile for size, use cdylib
  5. Cross-Compilation - Target specific architectures
  6. Strip Symbols - Remove debug info for production

The right configuration depends on your deployment target—Lambda needs minimum size, while server applications should prioritize runtime performance.


External Resources


Resources

Comments

Share this article

Scan to read on mobile