Build Optimization and Binary Size Reduction in Rust

TL;DR: This guide covers optimizing Rust binaries for production deployments. You’ll learn release profile configurations, LTO, link-time optimization, binary size reduction techniques, and special considerations for Lambda/serverless and embedded targets.

Introduction

Rust is known for generating optimized binaries, but default release builds aren’t always optimal. This article explores:

Release profile optimization techniques
Link-time optimization (LTO)
Binary size reduction for constrained environments
Lambda/serverless optimization
Cross-compilation for various targets

Cargo Release Profiles

Default Release Profile

[profile.release]
opt-level = 3
lto = false
codegen-units = 16
strip = false

Optimized Profile

[profile.release]
opt-level = 3          # Maximum optimization
lto = "fat"           # Full link-time optimization
codegen-units = 1     # Better optimization, slower compile
strip = true          # Strip symbols
panic = "abort"       # Smaller binary, no unwinding

Comparison

Setting	Default	Optimized	Impact
opt-level	3	3	Same optimization
lto	false	“fat”	10-20% faster, longer compile
codegen-units	16	1	Better optimization
strip	false	true	Smaller binary
panic	unwind	“abort”	5-10% smaller

Link-Time Optimization (LTO)

LTO allows the compiler to optimize across crate boundaries.

Thin LTO (Faster Builds, Good Optimization)

[profile.release]
opt-level = 3
lto = "thin"
codegen-units = 16

Fat LTO (Best Optimization, Slower Builds)

[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1

Runtime Performance Comparison

// benchmark.rs
use std::time::Instant;

fn main() {
    let iterations = 10_000_000;
    
    // Test function
    let start = Instant::now();
    for _ in 0..iterations {
        compute(42, 100);
    }
    let duration = start.elapsed();
    
    println!("Total time: {:?}", duration);
    println!("Per iteration: {:?}", duration / iterations);
}

fn compute(a: i32, b: i32) -> i32 {
    (a * b + a - b) / (b + 1)
}

Typical results:

Without LTO: ~120ms
Thin LTO: ~95ms (21% faster)
Fat LTO: ~88ms (27% faster)

Binary Size Optimization

Size-Z Profile (Maximum Size Reduction)

[profile.release]
opt-level = "z"       # Optimize for size
lto = "fat"
codegen-units = 1
strip = true
panic = "abort"

Size Optimization Options

opt-level	Effect
0	No optimization
1	Basic optimization
2	More optimization
3	Maximum speed
“z”	Maximum size reduction

Profile-Guided Optimization

// Enable PGO in build.rs
fn main() {
    println!("cargo:rustc-cfg=pgo_gen");
}

[profile.release]
pgo = "generate"

Reducing Dependency Size

Use Smaller Crate Alternatives

# Instead of tokio (full), use specific features
tokio = { version = "1", default-features = false, features = ["rt-multi-thread", "macros"] }

# Instead of serde (all formats), use what you need
serde = { version = "1", default-features = false, features = ["derive"] }
serde_json = "1"

# For logging, consider log + minimal implementation
log = "0.4"
env_logger = "0.10"  # Or use tracing with minimal features

Feature Minimization

# Default: pulls in many dependencies
axum = "0.7"

# Minimal: only what's needed
axum = { version = "0.7", default-features = false, features = ["macros", "tower-log"] }

Bill of Materials Comparison

# Show dependency tree
cargo tree --duplicates

# Show size of dependencies
cargo tree --size -t

# Count lines of code
cargo tree --duplicates | wc -l

Lambda and Serverless Optimization

Lambda-Specific Profile

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
panic = "abort"

# Additional Lambda optimizations
[lib]
crate-type = ["cdylib", "rlib"]

Lambda Handler

use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use serde::{Deserialize, Serialize};

#[derive(Deserialize)]
struct Request {
    name: String,
}

#[derive(Serialize)]
struct Response {
    message: String,
}

async fn function_handler(
    event: LambdaEvent<Request>,
) -> Result<Response, Error> {
    let name = event.payload.name;
    
    Ok(Response {
        message: format!("Hello, {}!", name),
    })
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    run(service_fn(function_handler)).await
}

Docker Multi-Stage Build for Lambda

# Build stage
FROM rust:1.75 AS builder

WORKDIR /app

COPY Cargo.toml Cargo.lock ./
COPY src ./src

RUN cargo build --release --lib

# Runtime stage
FROM scratch

COPY --from=builder /app/target/release/libmy_lambda.so /var/task/libmy_lambda.so
COPY --from=builder /app/bootstrap /var/task/bootstrap

ENTRYPOINT ["/var/task/bootstrap"]

Binary Size Comparison

Build Type	Binary Size
Debug	~25 MB
Release (default)	~4 MB
Release + LTO	~2.5 MB
Release + size opt (“z”)	~1.2 MB
Lambda optimized	~800 KB

Cross-Compilation

ARM64 (Apple Silicon, Raspberry Pi)

# Install target
rustup target add aarch64-unknown-linux-gnu

# Build
cargo build --release --target aarch64-unknown-linux-gnu

WASM (WebAssembly)

# Install WASM target
rustup target add wasm32-unknown-unknown

# Build
cargo build --release --target wasm32-unknown-unknown

# Convert to JavaScript
wasm-bindgen --out-dir ./out/ --target web ./target/wasm32-unknown-unknown/release/myapp.wasm

ARMv7 (Raspberry Pi 3)

rustup target add armv7-unknown-linux-gnueabihf

# Requires cross-compilation toolchain
cargo build --release --target armv7-unknown-linux-gnueabihf

Stripping and Debug Symbols

Strip Symbols

[profile.release]
strip = true

Manual Strip

# Strip all symbols
strip target/release/myapp

# Strip debug symbols only (keep symbols for debugging)
strip --strip-debug target/release/myapp

# Remove symbol table entirely
strip --strip-all target/release/myapp

Analyze Binary Contents

# Show sections
rust-size target/release/myapp

# Show dependencies
ldd target/release/myapp

# Show symbols
nm target/release/myapp | head -20

Incremental Compilation and Cache

Release Build Cache

# Use sccache for faster builds
cargo install sccache

# Set environment
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED=true

# Build
cargo build --release

Build Script Optimization

// build.rs
fn main() {
    // Only rebuild if build.rs changes
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/config.rs");
    
    // Don't generate code at build time if not needed
    // Use compile-time macros instead
}

WebAssembly Optimization

WASM Size Optimization

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1

[package.metadata.wasm]
profile = "release"

WASM Binaryen Optimization

# Install wasm-opt
cargo install wasm-bindgen-cli

# Run binaryen optimization
wasm-opt -Oz -o output.wasm input.wasm

Example WASM Crate

use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => {
            let mut a = 0u32;
            let mut b = 1u32;
            for _ in 2..=n {
                let temp = a + b;
                a = b;
                b = temp;
            }
            b
        }
    }
}

Embedded Systems Optimization

No Standard Library

#![no_std]
#![no_main]

use core::panic::PanicInfo;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
pub extern "C" fn main() {
    // Embedded application
}

Embedded Profile

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true

[target.thumbv7em-none-eabihf]
rustflags = [
    "-C", "link-arg=-Tlink.x",
    "-C", "target-cpu=cortex-m4",
]

Production Build Script

#!/bin/bash
set -e

echo "Building for production..."

# Clean previous builds
cargo clean

# Build with optimizations
cargo build --release \
    --lib \
    --bin myapp \
    --locked

# Strip symbols
strip target/release/myapp

# Show final size
echo "Final binary size:"
ls -lh target/release/myapp

# Show section sizes
rust-size target/release/myapp

Performance Benchmark

use std::time::Instant;

fn main() {
    // Fibonacci benchmark
    let iterations = 1000;
    
    // Release build comparison
    for profile in ["debug", "release"] {
        let start = Instant::now();
        for _ in 0..iterations {
            let _ = fibonacci(30);
        }
        println!("{}: {:?}", profile, start.elapsed());
    }
}

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

Typical output:

debug: ~800ms
release (default): ~15ms
release + LTO: ~10ms

Conclusion

Optimizing Rust binaries involves:

Release Profiles - Configure opt-level, LTO, codegen-units
Size Optimization - Use opt-level = “z” for smallest binaries
Dependency Management - Minimize features, use smaller crates
Lambda/Serverless - Profile for size, use cdylib
Cross-Compilation - Target specific architectures
Strip Symbols - Remove debug info for production

The right configuration depends on your deployment target—Lambda needs minimum size, while server applications should prioritize runtime performance.

Introduction

Cargo Release Profiles

Default Release Profile

Optimized Profile

Comparison

Link-Time Optimization (LTO)

Thin LTO (Faster Builds, Good Optimization)

Fat LTO (Best Optimization, Slower Builds)

Runtime Performance Comparison

Binary Size Optimization

Size-Z Profile (Maximum Size Reduction)

Size Optimization Options

Profile-Guided Optimization

Reducing Dependency Size

Use Smaller Crate Alternatives

Feature Minimization

Bill of Materials Comparison

Lambda and Serverless Optimization

Lambda-Specific Profile

Lambda Handler

Docker Multi-Stage Build for Lambda

Binary Size Comparison

Cross-Compilation

ARM64 (Apple Silicon, Raspberry Pi)

WASM (WebAssembly)

ARMv7 (Raspberry Pi 3)

Stripping and Debug Symbols

Strip Symbols

Manual Strip

Analyze Binary Contents

Incremental Compilation and Cache

Release Build Cache

Build Script Optimization

WebAssembly Optimization

WASM Size Optimization

WASM Binaryen Optimization

Example WASM Crate

Embedded Systems Optimization

No Standard Library

Embedded Profile

Production Build Script

Performance Benchmark

Conclusion

External Resources

Related Articles

Resources

Comments

Share this article

👍 Was this article helpful?