Rust for AI: Complete Guide 2026

Introduction

Rust is making significant inroads into AI and machine learning development. Known for its memory safety and performance, Rust offers compelling advantages for AI workloads: blazing-fast inference, efficient resource utilization, and safe concurrent systems. While Python remains the dominant language for AI research, Rust is becoming the language of choice for production AI systems.

In 2026, Rust’s AI ecosystem has matured dramatically. This guide covers building AI applications with Rust, from PyO3 bindings to native ML frameworks, helping you leverage Rust’s performance for your AI projects.

Why Rust for AI?

The Case for Rust in AI

Performance: Near-C performance with better safety
Memory Safety: No garbage collection pauses
Concurrency: Safe parallelism for batch processing
Production Ready: Native binaries, easy deployment
Python Integration: Seamless interop via PyO3

Where Rust Excels

Edge Deployment: Resource-constrained devices
Inference Serving: Low-latency predictions
Model Optimization: Convert and optimize models
Data Processing: ETL pipelines at scale
Embedded AI: Microcontrollers and IoT

PyO3: Python + Rust

Getting Started

# Cargo.toml
[package]
name = "rust_ai"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.22", features = ["numpy"] }
numpy = "0.22"

[build-dependencies]
pyo3-build-config = "0.22"

// build.rs
use pyo3_build_config::build;

fn main() {
    build();
}

Basic Module

// src/lib.rs
use pyo3::prelude::*;

#[pyfunction]
fn predict(input: Vec<f64>, weights: Vec<f64>) -> f64 {
    input.iter()
        .zip(weights.iter())
        .map(|(i, w)| i * w)
        .sum()
}

#[pyfunction]
fn sigmoid(x: f64) -> f64 {
    1.0 / (1.0 + (-x).exp())
}

#[pyfunction]
fn relu(x: f64) -> f64 {
    x.max(0.0)
}

#[pymodule]
fn rust_ai(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(predict, m)?)?;
    m.add_function(wrap_pyfunction!(sigmoid, m)?)?;
    m.add_function(wrap_pyfunction!(relu, m)?)?;
    Ok(())
}

Building

# Build the extension
maturin develop
# or
pip install maturin && maturin develop

Using from Python

import rust_ai

# Use Rust functions
weights = [0.5, 0.3, 0.2]
input_data = [1.0, 2.0, 3.0]

prediction = rust_ai.predict(input_data, weights)
print(f"Prediction: {prediction}")

# Fast activation functions
print(rust_ai.sigmoid(0.5))  # 0.622...
print(rust_ai.relu(-0.5))    # 0.0

Neural Network Layer

use pyo3::prelude::*;
use ndarray::Array2;

#[pyclass]
struct DenseLayer {
    weights: Array2<f64>,
    bias: Array2<f64>,
}

#[pymethods]
impl DenseLayer {
    #[new]
    fn new(input_size: usize, output_size: usize) -> Self {
        let weights = Array2::random((output_size, input_size), ndarray::Random::new(0.0..1.0));
        let bias = Array2::zeros((output_size, 1));
        
        Self { weights, bias }
    }
    
    fn forward(&self, input: Vec<Vec<f64>>) -> Vec<Vec<f64>> {
        let input_array = Array2::from_shape_vec((input[0].len(), input.len()), input.into_iter().flatten().collect()).unwrap();
        
        let output = (&self.weights).dot(&input_array) + &self.bias;
        
        output.columns().into_iter()
            .map(|col| col.to_vec())
            .collect()
    }
    
    fn relu(&self, input: Vec<Vec<f64>>) -> Vec<Vec<f64>> {
        self.forward(input).into_iter()
            .map(|row| row.into_iter().map(|v| v.max(0.0)).collect())
            .collect()
    }
}

Candle: Native ML Framework

Introduction to Candle

Candle is Meta’s native Rust ML framework:

# Cargo.toml
[dependencies]
candle-core = "0.8"
candle-nn = "0.8"
candle-transformers = "0.8"
serde = { version = "1.0", features = ["derive"] }

Basic Neural Network

use candle_core::{Device, Tensor};
use candle_nn::{Linear, Module, VarBuilder};

struct SimpleNet {
    linear1: Linear,
    linear2: Linear,
}

impl SimpleNet {
    fn new(vb: VarBuilder) -> candle_core::Result<Self> {
        let linear1 = candle_nn::linear(784, 128, vb.pp("linear1"))?;
        let linear2 = candle_nn::linear(128, 10, vb.pp("linear2"))?;
        Ok(Self { linear1, linear2 })
    }
}

impl Module for SimpleNet {
    fn forward(&self, xs: &Tensor) -> candle_core::Result<Tensor> {
        let xs = self.linear1.forward(xs)?;
        let xs = xs.relu()?;
        self.linear2.forward(&xs)
    }
}

fn main() -> candle_core::Result<()> {
    let device = Device::Cpu;
    let vb = VarBuilder::zeros((784, 10), candle_core::DType::F32, &device);
    
    let model = SimpleNet::new(vb)?;
    
    // Create dummy input
    let input = Tensor::zeros((1, 784), candle_core::DType::F32, &device)?;
    
    // Forward pass
    let output = model.forward(&input)?;
    
    println!("Output shape: {:?}", output.shape());
    Ok(())
}

Loading Models

use candle_transformers::models::bert::{BertModel, Config};

fn load_bert_model() -> candle_core::Result<BertModel> {
    let config = Config::v3();
    let vb = VarBuilder::from_pretrained("bert-base-uncased", candle_core::DType::F32)?;
    
    BertModel::new(&config, vb.pp("bert"))
}

fn run_inference() -> candle_core::Result<()> {
    let model = load_bert_model()?;
    let tokenizer = // initialize tokenizer
    
    let input = tokenizer.encode("Hello world", true)?;
    let input_ids = Tensor::new(&[input], &Device::Cpu)?;
    
    let output = model.forward(&input_ids, None, None, None)?;
    
    println!("Model output shape: {:?}", output.shape());
    Ok(())
}

Burn: Next-Gen ML Framework

Getting Started

# Cargo.toml
[dependencies]
burn = { version = "0.1", features = ["wasm-bindgen"] }
burn-ndarray = "0.1"
burn-tch = "0.1"  # PyTorch backend

Building Models

use burn::prelude::*;
use burn::nn::{Linear, LinearConfig, Relu};
use burn::train::{TrainStep, ClassificationOutput};
use burn::data::dataset::Dataset;

#[derive(Module, Debug)]
pub struct NeuralNetwork<B: Backend> {
    linear1: Linear<B>,
    linear2: Linear<B>,
    relu: Relu,
}

impl<B: Backend> NeuralNetwork<B> {
    pub fn new(config: &nn::config::AutodiffConfig<B>) -> Self {
        let linear1 = LinearConfig::new(784, 128).init();
        let linear2 = LinearConfig::new(128, 10).init();
        
        Self { linear1, linear2, relu: Relu::new() }
    }
    
    pub fn forward(&self, x: Tensor<B, 2>) -> Tensor<B, 2> {
        let x = self.linear1.forward(x);
        let x = self.relu.forward(x);
        self.linear2.forward(x)
    }
}

Training Loop

use burn::train::{Trainer, TrainConfig, checkpoint::{FileRootCheckpoint, Checkpointing}};

fn train_model() {
    let config = TrainConfig::new(
        Adam::new(),
        10,  // epochs
        32,  // batch size
    );
    
    let checkpoint = FileRootCheckpoint::new("./checkpoints")
        .withkeep(true);
    
    let mut trainer = Trainer::new(config, checkpoint);
    
    let model = // initialize model
    let dataloader = // create dataloader
    
    trainer.fit(model, dataloader);
}

Model Serving with Rust

High-Performance Inference Server

use candle_core::{Tensor, Device, DType};
use candle_nn::Linear;
use std::sync::Arc;
use tokio::sync::RwLock;

struct ModelServer {
    model: Arc<RwLock<Option<MyModel>>>,
    device: Device,
}

struct MyModel {
    linear: Linear,
}

impl ModelServer {
    async fn load_model(&self, path: &str) -> Result<(), Box<dyn std::error::Error>> {
        // Load model weights
        let model = MyModel::load(path)?;
        
        let mut guard = self.model.write().await;
        *guard = Some(model);
        
        Ok(())
    }
    
    async fn predict(&self, input: Vec<f32>) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
        let model_guard = self.model.read().await;
        
        let model = model_guard.as_ref()
            .ok_or("Model not loaded")?;
        
        // Convert input to tensor
        let input_tensor = Tensor::from_vec(
            input, 
            (1, 784), 
            &self.device
        )?;
        
        // Run inference
        let output = model.forward(&input_tensor)?;
        
        // Extract output
        let output_vec: Vec<f32> = output.to_vec()?;
        
        Ok(output_vec)
    }
}

Using with Axum

use axum::{Router, Json, extract::State};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
struct PredictionRequest {
    input: Vec<f32>,
}

#[derive(Serialize)]
struct PredictionResponse {
    prediction: Vec<f32>,
}

async fn predict(
    State(server): State<Arc<ModelServer>>,
    Json(req): Json<PredictionRequest>,
) -> Result<Json<PredictionResponse>, axum::extract::Json<()>> {
    let result = server.predict(req.input).await
        .map_err(|_| axum::extract::Json(()))?;
    
    Ok(Json(PredictionResponse { prediction: result }))
}

#[tokio::main]
async fn main() {
    let server = Arc::new(ModelServer::new());
    server.load_model("model.safetensors").await.unwrap();
    
    let app = Router::new()
        .route("/predict", post(predict))
        .with_state(server);
    
    axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();
}

Model Conversion

From PyTorch to Candle

use candle_core::safetensors::TensorView;

fn load_pytorch_weights(path: &str) -> candle_core::Result<candle_core::Tensor> {
    let file = std::fs::File::open(path)?;
    let tensors = candle_core::safetensors::read file(file)?;
    
    for (name, tensor) in tensors.tensors() {
        println!("Loaded tensor: {}", name);
    }
    
    Ok(())
}

ONNX Support

use candle_core::onnx::OnnxFile;

fn load_onnx_model(path: &str) -> candle_core::Result<OnnxFile> {
    OnnxFile::new(path)?
}

fn run_onnx_inference(model: &OnnxFile, input: Tensor) -> candle_core::Result<Tensor> {
    let graph = model.graph()?;
    
    // Find input and output names
    let input_name = graph.input()[0].name();
    let output_name = graph.output()[0].name();
    
    model.run(
        std::collections::HashMap::from([
            (input_name.to_string(), input)
        ]),
        std::collections::HashMap::from([
            (output_name.to_string(), vec![])
        ]),
    )
}

WebAssembly Deployment

Compiling for WASM

// src/lib.rs
use burn::prelude::*;
use burn_wasm::WasmBackend;

#[wasm_bindgen]
pub fn predict(input: Vec<f32>) -> Vec<f32> {
    let backend = WasmBackend::new();
    
    // Load model (simplified)
    let input_tensor = Tensor::<burn::WasmBackend, 2>::from_shape(
        [1, input.len()],
        &input,
    ).unwrap();
    
    // Run model
    // Return output
}

Building

# Add wasm target
rustup target add wasm32-unknown-unknown

# Build
cargo build --release --target wasm32-unknown-unknown

# Create WASM binding
wasm-bindgen --out-dir ./pkg/ --target web ./target/wasm32-unknown-unknown/release/my_ai.wasm

Using in Browser

import init, { predict } from './pkg/my_ai.js';

async function run() {
    await init();
    
    const input = [0.1, 0.2, 0.3, /* ... */];
    const output = predict(input);
    
    console.log("Prediction:", output);
}

run();

Embedded AI with Rust

Microcontroller Setup

#![no_std]
#![no_main]

use cortex_m_rt::entry;
use panic_halt as _;

#[entry]
fn main() -> ! {
    // Simple AI inference on microcontroller
    loop {
        // Read sensor
        let sensor_data = read_sensor();
        
        // Run simple inference
        let prediction = simple_nn_inference(sensor_data);
        
        // Act on prediction
        if prediction > 0.5 {
            led_on();
        }
        
        cortex_m::asm::delay(1_000_000);
    }
}

TinyML

// Quantized model for embedded
use microcontroller::ml::quantized::QuantizedModel;

struct EmbeddedAI {
    model: QuantizedModel<i8, 4>,  // 4-bit quantization
}

impl EmbeddedAI {
    fn new(weights: &[i8], input_size: usize, output_size: usize) -> Self {
        Self {
            model: QuantizedModel::new(weights, input_size, output_size),
        }
    }
    
    fn infer(&self, input: &[i8]) -> i8 {
        self.model.predict(input)
    }
}

Performance Optimization

Batching

async fn batch_inference(
    model: &Arc<Model>,
    inputs: Vec<Vec<f32>>,
) -> Vec<Vec<f32>> {
    // Stack inputs into single batch
    let batch_size = inputs.len();
    let input_size = inputs[0].len();
    
    let batched: Vec<f32> = inputs.into_iter()
        .flatten()
        .collect();
    
    let batch_tensor = Tensor::from_vec(
        batched,
        (batch_size, input_size),
        &model.device,
    ).unwrap();
    
    // Single forward pass
    let output = model.forward(&batch_tensor).unwrap();
    
    // Split results
    output.chunks(10).map(|c| c.to_vec()).collect()
}

GPU Acceleration

fn use_gpu() -> candle_core::Result<()> {
    // Try CUDA first, fall back to CPU
    let device = Device::new_cuda(0)
        .or_else(|_| Device::new_metal(0))
        .unwrap_or(Device::Cpu);
    
    println!("Using device: {:?}", device);
    
    // Create tensors on GPU
    let tensor = Tensor::from_vec(
        vec![1.0f32; 1000],
        (1, 1000),
        &device,
    )?;
    
    // Operations automatically use GPU
    let result = tensor.sum(1)?;
    
    Ok(())
}

Best Practices

Error Handling

use thiserror::Error;

#[derive(Error, Debug)]
pub enum AIError {
    #[error("Model not loaded")]
    ModelNotLoaded,
    
    #[error("Invalid input shape: {0}")]
    InvalidShape(String),
    
    #[error("Inference error: {0}")]
    InferenceError(String),
    
    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

pub type Result<T> = std::result::Result<T, AIError>;

Testing

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_activation_functions() {
        // Test sigmoid
        assert!((sigmoid(0.0) - 0.5).abs() < 0.001);
        
        // Test relu
        assert!(relu(-1.0).abs() < 0.001);
        assert!((relu(5.0) - 5.0).abs() < 0.001);
    }
    
    #[test]
    fn test_model_output_range() {
        // Test output is in valid range
        // for softmax outputs
    }
}

Conclusion

Rust offers compelling advantages for AI development, particularly for production inference, edge deployment, and performance-critical components. The ecosystem has matured significantly with PyO3 for Python integration, Candle and Burn for native ML, and excellent WASM support.

Start with PyO3 to accelerate Python code, then explore native frameworks for full Rust AI development. The performance benefits are substantial, especially for high-throughput inference workloads.

Introduction

Why Rust for AI?

The Case for Rust in AI

Where Rust Excels

PyO3: Python + Rust

Getting Started

Basic Module

Building

Using from Python

Neural Network Layer

Candle: Native ML Framework

Introduction to Candle

Basic Neural Network

Loading Models

Burn: Next-Gen ML Framework

Getting Started

Building Models

Training Loop

Model Serving with Rust

High-Performance Inference Server

Using with Axum

Model Conversion

From PyTorch to Candle

ONNX Support

WebAssembly Deployment

Compiling for WASM

Building

Using in Browser

Embedded AI with Rust

Microcontroller Setup

TinyML

Performance Optimization

Batching

GPU Acceleration

Best Practices

Error Handling

Testing

Conclusion

Resources

Comments