Build Optimization and Binary Size Reduction in Rust
TL;DR: This guide covers optimizing Rust binaries for production deployments. You’ll learn release profile configurations, LTO, link-time optimization, binary size reduction techniques, and special considerations for Lambda/serverless and embedded targets.
Introduction
Rust is known for generating optimized binaries, but default release builds aren’t always optimal. This article explores:
- Release profile optimization techniques
- Link-time optimization (LTO)
- Binary size reduction for constrained environments
- Lambda/serverless optimization
- Cross-compilation for various targets
Cargo Release Profiles
Default Release Profile
[profile.release]
opt-level = 3
lto = false
codegen-units = 16
strip = false
Optimized Profile
[profile.release]
opt-level = 3 # Maximum optimization
lto = "fat" # Full link-time optimization
codegen-units = 1 # Better optimization, slower compile
strip = true # Strip symbols
panic = "abort" # Smaller binary, no unwinding
Comparison
| Setting | Default | Optimized | Impact |
|---|---|---|---|
| opt-level | 3 | 3 | Same optimization |
| lto | false | “fat” | 10-20% faster, longer compile |
| codegen-units | 16 | 1 | Better optimization |
| strip | false | true | Smaller binary |
| panic | unwind | “abort” | 5-10% smaller |
Link-Time Optimization (LTO)
LTO allows the compiler to optimize across crate boundaries.
Thin LTO (Faster Builds, Good Optimization)
[profile.release]
opt-level = 3
lto = "thin"
codegen-units = 16
Fat LTO (Best Optimization, Slower Builds)
[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
Runtime Performance Comparison
// benchmark.rs
use std::time::Instant;
fn main() {
let iterations = 10_000_000;
// Test function
let start = Instant::now();
for _ in 0..iterations {
compute(42, 100);
}
let duration = start.elapsed();
println!("Total time: {:?}", duration);
println!("Per iteration: {:?}", duration / iterations);
}
fn compute(a: i32, b: i32) -> i32 {
(a * b + a - b) / (b + 1)
}
Typical results:
- Without LTO: ~120ms
- Thin LTO: ~95ms (21% faster)
- Fat LTO: ~88ms (27% faster)
Binary Size Optimization
Size-Z Profile (Maximum Size Reduction)
[profile.release]
opt-level = "z" # Optimize for size
lto = "fat"
codegen-units = 1
strip = true
panic = "abort"
Size Optimization Options
| opt-level | Effect |
|---|---|
| 0 | No optimization |
| 1 | Basic optimization |
| 2 | More optimization |
| 3 | Maximum speed |
| “z” | Maximum size reduction |
Profile-Guided Optimization
// Enable PGO in build.rs
fn main() {
println!("cargo:rustc-cfg=pgo_gen");
}
[profile.release]
pgo = "generate"
Reducing Dependency Size
Use Smaller Crate Alternatives
# Instead of tokio (full), use specific features
tokio = { version = "1", default-features = false, features = ["rt-multi-thread", "macros"] }
# Instead of serde (all formats), use what you need
serde = { version = "1", default-features = false, features = ["derive"] }
serde_json = "1"
# For logging, consider log + minimal implementation
log = "0.4"
env_logger = "0.10" # Or use tracing with minimal features
Feature Minimization
# Default: pulls in many dependencies
axum = "0.7"
# Minimal: only what's needed
axum = { version = "0.7", default-features = false, features = ["macros", "tower-log"] }
Bill of Materials Comparison
# Show dependency tree
cargo tree --duplicates
# Show size of dependencies
cargo tree --size -t
# Count lines of code
cargo tree --duplicates | wc -l
Lambda and Serverless Optimization
Lambda-Specific Profile
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
panic = "abort"
# Additional Lambda optimizations
[lib]
crate-type = ["cdylib", "rlib"]
Lambda Handler
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use serde::{Deserialize, Serialize};
#[derive(Deserialize)]
struct Request {
name: String,
}
#[derive(Serialize)]
struct Response {
message: String,
}
async fn function_handler(
event: LambdaEvent<Request>,
) -> Result<Response, Error> {
let name = event.payload.name;
Ok(Response {
message: format!("Hello, {}!", name),
})
}
#[tokio::main]
async fn main() -> Result<(), Error> {
run(service_fn(function_handler)).await
}
Docker Multi-Stage Build for Lambda
# Build stage
FROM rust:1.75 AS builder
WORKDIR /app
COPY Cargo.toml Cargo.lock ./
COPY src ./src
RUN cargo build --release --lib
# Runtime stage
FROM scratch
COPY --from=builder /app/target/release/libmy_lambda.so /var/task/libmy_lambda.so
COPY --from=builder /app/bootstrap /var/task/bootstrap
ENTRYPOINT ["/var/task/bootstrap"]
Binary Size Comparison
| Build Type | Binary Size |
|---|---|
| Debug | ~25 MB |
| Release (default) | ~4 MB |
| Release + LTO | ~2.5 MB |
| Release + size opt (“z”) | ~1.2 MB |
| Lambda optimized | ~800 KB |
Cross-Compilation
ARM64 (Apple Silicon, Raspberry Pi)
# Install target
rustup target add aarch64-unknown-linux-gnu
# Build
cargo build --release --target aarch64-unknown-linux-gnu
WASM (WebAssembly)
# Install WASM target
rustup target add wasm32-unknown-unknown
# Build
cargo build --release --target wasm32-unknown-unknown
# Convert to JavaScript
wasm-bindgen --out-dir ./out/ --target web ./target/wasm32-unknown-unknown/release/myapp.wasm
ARMv7 (Raspberry Pi 3)
rustup target add armv7-unknown-linux-gnueabihf
# Requires cross-compilation toolchain
cargo build --release --target armv7-unknown-linux-gnueabihf
Stripping and Debug Symbols
Strip Symbols
[profile.release]
strip = true
Manual Strip
# Strip all symbols
strip target/release/myapp
# Strip debug symbols only (keep symbols for debugging)
strip --strip-debug target/release/myapp
# Remove symbol table entirely
strip --strip-all target/release/myapp
Analyze Binary Contents
# Show sections
rust-size target/release/myapp
# Show dependencies
ldd target/release/myapp
# Show symbols
nm target/release/myapp | head -20
Incremental Compilation and Cache
Release Build Cache
# Use sccache for faster builds
cargo install sccache
# Set environment
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED=true
# Build
cargo build --release
Build Script Optimization
// build.rs
fn main() {
// Only rebuild if build.rs changes
println!("cargo:rerun-if-changed=build.rs");
println!("cargo:rerun-if-changed=src/config.rs");
// Don't generate code at build time if not needed
// Use compile-time macros instead
}
WebAssembly Optimization
WASM Size Optimization
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
[package.metadata.wasm]
profile = "release"
WASM Binaryen Optimization
# Install wasm-opt
cargo install wasm-bindgen-cli
# Run binaryen optimization
wasm-opt -Oz -o output.wasm input.wasm
Example WASM Crate
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub fn fibonacci(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => {
let mut a = 0u32;
let mut b = 1u32;
for _ in 2..=n {
let temp = a + b;
a = b;
b = temp;
}
b
}
}
}
Embedded Systems Optimization
No Standard Library
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
loop {}
}
#[no_mangle]
pub extern "C" fn main() {
// Embedded application
}
Embedded Profile
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
[target.thumbv7em-none-eabihf]
rustflags = [
"-C", "link-arg=-Tlink.x",
"-C", "target-cpu=cortex-m4",
]
Production Build Script
#!/bin/bash
set -e
echo "Building for production..."
# Clean previous builds
cargo clean
# Build with optimizations
cargo build --release \
--lib \
--bin myapp \
--locked
# Strip symbols
strip target/release/myapp
# Show final size
echo "Final binary size:"
ls -lh target/release/myapp
# Show section sizes
rust-size target/release/myapp
Performance Benchmark
use std::time::Instant;
fn main() {
// Fibonacci benchmark
let iterations = 1000;
// Release build comparison
for profile in ["debug", "release"] {
let start = Instant::now();
for _ in 0..iterations {
let _ = fibonacci(30);
}
println!("{}: {:?}", profile, start.elapsed());
}
}
fn fibonacci(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
Typical output:
- debug: ~800ms
- release (default): ~15ms
- release + LTO: ~10ms
Conclusion
Optimizing Rust binaries involves:
- Release Profiles - Configure opt-level, LTO, codegen-units
- Size Optimization - Use opt-level = “z” for smallest binaries
- Dependency Management - Minimize features, use smaller crates
- Lambda/Serverless - Profile for size, use cdylib
- Cross-Compilation - Target specific architectures
- Strip Symbols - Remove debug info for production
The right configuration depends on your deployment targetโLambda needs minimum size, while server applications should prioritize runtime performance.
External Resources
Related Articles
- Rust for Cloud Native and Kubernetes
- Production Deployment: Docker, CI/CD, Monitoring
- Optimize Rust Binary Size for Lambda
- WebAssembly with Rust
Comments