Introduction
Rust is making significant inroads into AI and machine learning development. Known for its memory safety and performance, Rust offers compelling advantages for AI workloads: blazing-fast inference, efficient resource utilization, and safe concurrent systems. While Python remains the dominant language for AI research, Rust is becoming the language of choice for production AI systems.
In 2026, Rust’s AI ecosystem has matured dramatically. This guide covers building AI applications with Rust, from PyO3 bindings to native ML frameworks, helping you leverage Rust’s performance for your AI projects.
Why Rust for AI?
The Case for Rust in AI
- Performance: Near-C performance with better safety
- Memory Safety: No garbage collection pauses
- Concurrency: Safe parallelism for batch processing
- Production Ready: Native binaries, easy deployment
- Python Integration: Seamless interop via PyO3
Where Rust Excels
- Edge Deployment: Resource-constrained devices
- Inference Serving: Low-latency predictions
- Model Optimization: Convert and optimize models
- Data Processing: ETL pipelines at scale
- Embedded AI: Microcontrollers and IoT
PyO3: Python + Rust
Getting Started
# Cargo.toml
[package]
name = "rust_ai"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.22", features = ["numpy"] }
numpy = "0.22"
[build-dependencies]
pyo3-build-config = "0.22"
// build.rs
use pyo3_build_config::build;
fn main() {
build();
}
Basic Module
// src/lib.rs
use pyo3::prelude::*;
#[pyfunction]
fn predict(input: Vec<f64>, weights: Vec<f64>) -> f64 {
input.iter()
.zip(weights.iter())
.map(|(i, w)| i * w)
.sum()
}
#[pyfunction]
fn sigmoid(x: f64) -> f64 {
1.0 / (1.0 + (-x).exp())
}
#[pyfunction]
fn relu(x: f64) -> f64 {
x.max(0.0)
}
#[pymodule]
fn rust_ai(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(predict, m)?)?;
m.add_function(wrap_pyfunction!(sigmoid, m)?)?;
m.add_function(wrap_pyfunction!(relu, m)?)?;
Ok(())
}
Building
# Build the extension
maturin develop
# or
pip install maturin && maturin develop
Using from Python
import rust_ai
# Use Rust functions
weights = [0.5, 0.3, 0.2]
input_data = [1.0, 2.0, 3.0]
prediction = rust_ai.predict(input_data, weights)
print(f"Prediction: {prediction}")
# Fast activation functions
print(rust_ai.sigmoid(0.5)) # 0.622...
print(rust_ai.relu(-0.5)) # 0.0
Neural Network Layer
use pyo3::prelude::*;
use ndarray::Array2;
#[pyclass]
struct DenseLayer {
weights: Array2<f64>,
bias: Array2<f64>,
}
#[pymethods]
impl DenseLayer {
#[new]
fn new(input_size: usize, output_size: usize) -> Self {
let weights = Array2::random((output_size, input_size), ndarray::Random::new(0.0..1.0));
let bias = Array2::zeros((output_size, 1));
Self { weights, bias }
}
fn forward(&self, input: Vec<Vec<f64>>) -> Vec<Vec<f64>> {
let input_array = Array2::from_shape_vec((input[0].len(), input.len()), input.into_iter().flatten().collect()).unwrap();
let output = (&self.weights).dot(&input_array) + &self.bias;
output.columns().into_iter()
.map(|col| col.to_vec())
.collect()
}
fn relu(&self, input: Vec<Vec<f64>>) -> Vec<Vec<f64>> {
self.forward(input).into_iter()
.map(|row| row.into_iter().map(|v| v.max(0.0)).collect())
.collect()
}
}
Candle: Native ML Framework
Introduction to Candle
Candle is Meta’s native Rust ML framework:
# Cargo.toml
[dependencies]
candle-core = "0.8"
candle-nn = "0.8"
candle-transformers = "0.8"
serde = { version = "1.0", features = ["derive"] }
Basic Neural Network
use candle_core::{Device, Tensor};
use candle_nn::{Linear, Module, VarBuilder};
struct SimpleNet {
linear1: Linear,
linear2: Linear,
}
impl SimpleNet {
fn new(vb: VarBuilder) -> candle_core::Result<Self> {
let linear1 = candle_nn::linear(784, 128, vb.pp("linear1"))?;
let linear2 = candle_nn::linear(128, 10, vb.pp("linear2"))?;
Ok(Self { linear1, linear2 })
}
}
impl Module for SimpleNet {
fn forward(&self, xs: &Tensor) -> candle_core::Result<Tensor> {
let xs = self.linear1.forward(xs)?;
let xs = xs.relu()?;
self.linear2.forward(&xs)
}
}
fn main() -> candle_core::Result<()> {
let device = Device::Cpu;
let vb = VarBuilder::zeros((784, 10), candle_core::DType::F32, &device);
let model = SimpleNet::new(vb)?;
// Create dummy input
let input = Tensor::zeros((1, 784), candle_core::DType::F32, &device)?;
// Forward pass
let output = model.forward(&input)?;
println!("Output shape: {:?}", output.shape());
Ok(())
}
Loading Models
use candle_transformers::models::bert::{BertModel, Config};
fn load_bert_model() -> candle_core::Result<BertModel> {
let config = Config::v3();
let vb = VarBuilder::from_pretrained("bert-base-uncased", candle_core::DType::F32)?;
BertModel::new(&config, vb.pp("bert"))
}
fn run_inference() -> candle_core::Result<()> {
let model = load_bert_model()?;
let tokenizer = // initialize tokenizer
let input = tokenizer.encode("Hello world", true)?;
let input_ids = Tensor::new(&[input], &Device::Cpu)?;
let output = model.forward(&input_ids, None, None, None)?;
println!("Model output shape: {:?}", output.shape());
Ok(())
}
Burn: Next-Gen ML Framework
Getting Started
# Cargo.toml
[dependencies]
burn = { version = "0.1", features = ["wasm-bindgen"] }
burn-ndarray = "0.1"
burn-tch = "0.1" # PyTorch backend
Building Models
use burn::prelude::*;
use burn::nn::{Linear, LinearConfig, Relu};
use burn::train::{TrainStep, ClassificationOutput};
use burn::data::dataset::Dataset;
#[derive(Module, Debug)]
pub struct NeuralNetwork<B: Backend> {
linear1: Linear<B>,
linear2: Linear<B>,
relu: Relu,
}
impl<B: Backend> NeuralNetwork<B> {
pub fn new(config: &nn::config::AutodiffConfig<B>) -> Self {
let linear1 = LinearConfig::new(784, 128).init();
let linear2 = LinearConfig::new(128, 10).init();
Self { linear1, linear2, relu: Relu::new() }
}
pub fn forward(&self, x: Tensor<B, 2>) -> Tensor<B, 2> {
let x = self.linear1.forward(x);
let x = self.relu.forward(x);
self.linear2.forward(x)
}
}
Training Loop
use burn::train::{Trainer, TrainConfig, checkpoint::{FileRootCheckpoint, Checkpointing}};
fn train_model() {
let config = TrainConfig::new(
Adam::new(),
10, // epochs
32, // batch size
);
let checkpoint = FileRootCheckpoint::new("./checkpoints")
.withkeep(true);
let mut trainer = Trainer::new(config, checkpoint);
let model = // initialize model
let dataloader = // create dataloader
trainer.fit(model, dataloader);
}
Model Serving with Rust
High-Performance Inference Server
use candle_core::{Tensor, Device, DType};
use candle_nn::Linear;
use std::sync::Arc;
use tokio::sync::RwLock;
struct ModelServer {
model: Arc<RwLock<Option<MyModel>>>,
device: Device,
}
struct MyModel {
linear: Linear,
}
impl ModelServer {
async fn load_model(&self, path: &str) -> Result<(), Box<dyn std::error::Error>> {
// Load model weights
let model = MyModel::load(path)?;
let mut guard = self.model.write().await;
*guard = Some(model);
Ok(())
}
async fn predict(&self, input: Vec<f32>) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
let model_guard = self.model.read().await;
let model = model_guard.as_ref()
.ok_or("Model not loaded")?;
// Convert input to tensor
let input_tensor = Tensor::from_vec(
input,
(1, 784),
&self.device
)?;
// Run inference
let output = model.forward(&input_tensor)?;
// Extract output
let output_vec: Vec<f32> = output.to_vec()?;
Ok(output_vec)
}
}
Using with Axum
use axum::{Router, Json, extract::State};
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
struct PredictionRequest {
input: Vec<f32>,
}
#[derive(Serialize)]
struct PredictionResponse {
prediction: Vec<f32>,
}
async fn predict(
State(server): State<Arc<ModelServer>>,
Json(req): Json<PredictionRequest>,
) -> Result<Json<PredictionResponse>, axum::extract::Json<()>> {
let result = server.predict(req.input).await
.map_err(|_| axum::extract::Json(()))?;
Ok(Json(PredictionResponse { prediction: result }))
}
#[tokio::main]
async fn main() {
let server = Arc::new(ModelServer::new());
server.load_model("model.safetensors").await.unwrap();
let app = Router::new()
.route("/predict", post(predict))
.with_state(server);
axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
}
Model Conversion
From PyTorch to Candle
use candle_core::safetensors::TensorView;
fn load_pytorch_weights(path: &str) -> candle_core::Result<candle_core::Tensor> {
let file = std::fs::File::open(path)?;
let tensors = candle_core::safetensors::read file(file)?;
for (name, tensor) in tensors.tensors() {
println!("Loaded tensor: {}", name);
}
Ok(())
}
ONNX Support
use candle_core::onnx::OnnxFile;
fn load_onnx_model(path: &str) -> candle_core::Result<OnnxFile> {
OnnxFile::new(path)?
}
fn run_onnx_inference(model: &OnnxFile, input: Tensor) -> candle_core::Result<Tensor> {
let graph = model.graph()?;
// Find input and output names
let input_name = graph.input()[0].name();
let output_name = graph.output()[0].name();
model.run(
std::collections::HashMap::from([
(input_name.to_string(), input)
]),
std::collections::HashMap::from([
(output_name.to_string(), vec![])
]),
)
}
WebAssembly Deployment
Compiling for WASM
// src/lib.rs
use burn::prelude::*;
use burn_wasm::WasmBackend;
#[wasm_bindgen]
pub fn predict(input: Vec<f32>) -> Vec<f32> {
let backend = WasmBackend::new();
// Load model (simplified)
let input_tensor = Tensor::<burn::WasmBackend, 2>::from_shape(
[1, input.len()],
&input,
).unwrap();
// Run model
// Return output
}
Building
# Add wasm target
rustup target add wasm32-unknown-unknown
# Build
cargo build --release --target wasm32-unknown-unknown
# Create WASM binding
wasm-bindgen --out-dir ./pkg/ --target web ./target/wasm32-unknown-unknown/release/my_ai.wasm
Using in Browser
import init, { predict } from './pkg/my_ai.js';
async function run() {
await init();
const input = [0.1, 0.2, 0.3, /* ... */];
const output = predict(input);
console.log("Prediction:", output);
}
run();
Embedded AI with Rust
Microcontroller Setup
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use panic_halt as _;
#[entry]
fn main() -> ! {
// Simple AI inference on microcontroller
loop {
// Read sensor
let sensor_data = read_sensor();
// Run simple inference
let prediction = simple_nn_inference(sensor_data);
// Act on prediction
if prediction > 0.5 {
led_on();
}
cortex_m::asm::delay(1_000_000);
}
}
TinyML
// Quantized model for embedded
use microcontroller::ml::quantized::QuantizedModel;
struct EmbeddedAI {
model: QuantizedModel<i8, 4>, // 4-bit quantization
}
impl EmbeddedAI {
fn new(weights: &[i8], input_size: usize, output_size: usize) -> Self {
Self {
model: QuantizedModel::new(weights, input_size, output_size),
}
}
fn infer(&self, input: &[i8]) -> i8 {
self.model.predict(input)
}
}
Performance Optimization
Batching
async fn batch_inference(
model: &Arc<Model>,
inputs: Vec<Vec<f32>>,
) -> Vec<Vec<f32>> {
// Stack inputs into single batch
let batch_size = inputs.len();
let input_size = inputs[0].len();
let batched: Vec<f32> = inputs.into_iter()
.flatten()
.collect();
let batch_tensor = Tensor::from_vec(
batched,
(batch_size, input_size),
&model.device,
).unwrap();
// Single forward pass
let output = model.forward(&batch_tensor).unwrap();
// Split results
output.chunks(10).map(|c| c.to_vec()).collect()
}
GPU Acceleration
fn use_gpu() -> candle_core::Result<()> {
// Try CUDA first, fall back to CPU
let device = Device::new_cuda(0)
.or_else(|_| Device::new_metal(0))
.unwrap_or(Device::Cpu);
println!("Using device: {:?}", device);
// Create tensors on GPU
let tensor = Tensor::from_vec(
vec![1.0f32; 1000],
(1, 1000),
&device,
)?;
// Operations automatically use GPU
let result = tensor.sum(1)?;
Ok(())
}
Best Practices
Error Handling
use thiserror::Error;
#[derive(Error, Debug)]
pub enum AIError {
#[error("Model not loaded")]
ModelNotLoaded,
#[error("Invalid input shape: {0}")]
InvalidShape(String),
#[error("Inference error: {0}")]
InferenceError(String),
#[error("IO error: {0}")]
IoError(#[from] std::io::Error),
}
pub type Result<T> = std::result::Result<T, AIError>;
Testing
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_activation_functions() {
// Test sigmoid
assert!((sigmoid(0.0) - 0.5).abs() < 0.001);
// Test relu
assert!(relu(-1.0).abs() < 0.001);
assert!((relu(5.0) - 5.0).abs() < 0.001);
}
#[test]
fn test_model_output_range() {
// Test output is in valid range
// for softmax outputs
}
}
Conclusion
Rust offers compelling advantages for AI development, particularly for production inference, edge deployment, and performance-critical components. The ecosystem has matured significantly with PyO3 for Python integration, Candle and Burn for native ML, and excellent WASM support.
Start with PyO3 to accelerate Python code, then explore native frameworks for full Rust AI development. The performance benefits are substantial, especially for high-throughput inference workloads.
Comments