Introduction
Machine learning in Rust is experiencing rapid growth. The ecosystem has matured significantly, with libraries like Polars, ndarray, and nalgebra providing production-grade tools for data processing, numerical computing, and linear algebra. However, choosing the right library for your specific use case can be challenging, especially for developers transitioning from Python’s rich ML ecosystem.
This comprehensive guide compares three major Rust ML libraries, helping you make informed decisions about which tool best fits your project requirements. Whether you’re building data pipelines, implementing algorithms, or performing scientific computing, understanding the strengths and trade-offs of each library is essential for success.
Why Rust for Machine Learning?
Before diving into library comparisons, let’s understand why Rust is increasingly attractive for ML work:
- Memory Safety: Rust’s ownership system eliminates entire categories of bugs (null pointers, buffer overflows, data races) without garbage collection overhead
- Performance: Near-C/C++ speeds with zero-cost abstractions, making Rust ideal for computationally intensive ML workloads
- Fearless Concurrency: Built-in parallelism primitives enable efficient multi-threaded data processing
- Type Safety: Compile-time guarantees catch errors before they reach production
- Deployment: Single binary deployment with no runtime dependencies simplifies production ML systems
- Predictability: No GC pauses mean consistent latency for inference servers and real-time systems
The Three Libraries at a Glance
| Feature | Polars | ndarray | nalgebra |
|---|---|---|---|
| Primary Use | DataFrames & ETL | NumPy-like arrays | Linear algebra |
| Data Type | Tabular (rows/columns) | N-dimensional arrays | Matrices/vectors |
| Performance | Blazingly fast | Very good | Excellent |
| Ease of Use | High (pandas-like) | Medium (NumPy-like) | Medium (specialized) |
| Ecosystem | Growing rapidly | Mature | Mature |
| GPU Support | Limited (experimental) | Via external libs | Via external libs |
| Lazy Evaluation | Yes (query optimization) | No | No |
| Memory Efficiency | Excellent | Good | Good |
| Sparse Data | Good | Good | Limited |
| Best For | Data pipelines | General computing | Math-heavy work |
Polars: The DataFrame Powerhouse
Overview and Design Philosophy
Polars is a modern DataFrame library written entirely in Rust, designed from the ground up for speed and efficiency. Unlike pandas (which evolved from older code), Polars was built with performance as a first-class concern, leveraging Rust’s memory model and parallelism capabilities.
Key Design Principles:
- Lazy evaluation: Define computation graphs that are optimized before execution
- Query optimization: Automatically reorders operations for efficiency
- Parallel execution: Multi-threaded operations by default
- Memory efficiency: Zero-copy operations where possible
When to Use Polars
Polars excels in scenarios involving tabular data manipulation:
- Data loading and exploration: Read CSV, Parquet, JSON, or database files efficiently
- ETL pipelines: Transform, filter, aggregate, and join data at scale
- Feature engineering: Create and manipulate features for ML models
- Data analysis: Groupby operations, window functions, and complex aggregations
- Lazy evaluation: Define complex queries that are optimized before execution
- Production pipelines: Handle large datasets with minimal memory overhead
Practical Example: Feature Engineering Pipeline
use polars::prelude::*;
fn main() -> PolarsResult<()> {
// Read data
let df = CsvReader::from_path("sales_data.csv")?
.infer_schema(None)
.finish()?;
// Complex feature engineering with lazy evaluation
let features = df
.lazy()
// Filter for recent data
.filter(col("date").gt(lit("2024-01-01")))
// Create new features
.with_columns([
(col("price") * col("quantity")).alias("total_value"),
col("price").rolling_mean(7).alias("price_ma7"),
])
// Group and aggregate
.groupby([col("customer_id")])
.agg([
col("total_value").sum().alias("total_spent"),
col("quantity").mean().alias("avg_quantity"),
col("price").std().alias("price_std"),
])
// Sort by total spent
.sort("total_spent", Default::default())
.collect()?;
println!("{}", features);
Ok(())
}
Strengths
โ
Blazing Performance: Optimized query engine with parallel execution
โ
Lazy Evaluation: Only computes what you need, optimizing query plans
โ
Rich API: Comprehensive DataFrame operations (joins, groupby, window functions)
โ
Multiple Formats: Native support for CSV, Parquet, JSON, and database connectors
โ
Memory Efficient: Zero-copy operations and efficient memory management
โ
Familiar Syntax: Similar to pandas for easy transition from Python
Limitations
โ Specialized for DataFrames: Not suitable for general N-dimensional array operations
โ Limited Math Operations: Lacks advanced mathematical functions (use with ndarray/nalgebra)
โ Smaller Ecosystem: Fewer third-party integrations compared to pandas
โ Learning Curve: Lazy evaluation requires different thinking than eager evaluation
Performance Characteristics
Polars typically outperforms pandas by 5-50x depending on the operation:
- CSV loading: 10-20x faster
- Filtering: 5-15x faster
- Groupby operations: 10-30x faster
- Joins: 5-20x faster
ndarray: The NumPy Alternative
Overview and Design Philosophy
ndarray is Rust’s primary answer to NumPy, providing N-dimensional arrays with broadcasting, slicing, and mathematical operations. It’s designed to be familiar to NumPy users while leveraging Rust’s performance and safety guarantees.
Key Design Principles:
- NumPy compatibility: Familiar API for Python developers
- Broadcasting: Automatic shape alignment for operations
- Slicing and indexing: Powerful array manipulation
- Integration: Works seamlessly with other Rust libraries
When to Use ndarray
ndarray is ideal for numerical computing tasks:
- Numerical computing: Matrix operations, element-wise computations, reductions
- Algorithm implementation: Build custom ML algorithms from scratch
- Scientific computing: General-purpose numerical work
- Bridge from Python: Familiar API for NumPy/SciPy users
- Integration: Works well with Polars, nalgebra, and other Rust libraries
- Data transformations: Normalize, scale, and transform numerical data
Practical Example: Data Normalization and Transformation
use ndarray::{Array2, Array1, s};
use ndarray_stats::QuantileExt;
fn normalize_features(data: &Array2<f64>) -> Array2<f64> {
// Compute mean and standard deviation along axis 0 (columns)
let mean = data.mean_axis(ndarray::Axis(0)).unwrap();
let std = data.std_axis(ndarray::Axis(0), 0.0);
// Normalize: (x - mean) / std
let normalized = (data - &mean) / &std;
normalized
}
fn main() {
// Create sample feature matrix (100 samples, 5 features)
let data = Array2::<f64>::zeros((100, 5));
// Normalize features
let normalized = normalize_features(&data);
// Compute correlation matrix
let correlation = normalized.t().dot(&normalized) / 100.0;
println!("Correlation matrix:\n{}", correlation);
// Slicing and indexing
let first_10_samples = &normalized.slice(s![0..10, ..]);
println!("First 10 samples:\n{}", first_10_samples);
// Element-wise operations
let squared = &normalized * &normalized;
let sum_of_squares = squared.sum_axis(ndarray::Axis(1));
println!("Sum of squares per sample:\n{}", sum_of_squares);
}
Strengths
โ
NumPy-like Syntax: Familiar for data scientists transitioning from Python
โ
Rich Ecosystem: ndarray-linalg, ndarray-stats for advanced operations
โ
Broadcasting and Slicing: Intuitive array operations with powerful indexing
โ
Integration-Friendly: Works seamlessly with Polars, nalgebra, and other libraries
โ
Type Safety: Compile-time dimension checking prevents shape errors
โ
Performance: Competitive with NumPy for most operations
Limitations
โ Eager Evaluation: No lazy evaluation or query optimization
โ Steeper Learning Curve: More complex than Polars for simple operations
โ GPU Support: Requires external crates (tch-rs, burn) for GPU acceleration
โ Memory Overhead: Larger memory footprint than specialized libraries for specific tasks
Performance Characteristics
ndarray performance is comparable to NumPy:
- Matrix multiplication: Similar to NumPy (BLAS-optimized)
- Element-wise operations: Slightly faster than NumPy due to Rust optimizations
- Reductions: Comparable to NumPy
- Memory usage: Similar to NumPy
nalgebra: The Linear Algebra Specialist
Overview and Design Philosophy
nalgebra is a pure, high-performance linear algebra library focused on correctness and efficiency. It provides both fixed-size (compile-time known dimensions) and dynamic-size matrices, with comprehensive decomposition support.
Key Design Principles:
- Type Safety: Compile-time matrix dimension guarantees
- Performance: Optimized for linear algebra operations
- Flexibility: Both fixed and dynamic-size matrices
- Completeness: Comprehensive decomposition and solver support
When to Use nalgebra
nalgebra is essential for linear algebra-heavy work:
- Matrix decompositions: SVD, QR, LU, Cholesky, Eigenvalue decompositions
- Linear system solving: Solve Ax = b efficiently
- Geometric computations: Rotations, translations, projections
- Scientific computing: Eigenvalue problems, least squares fitting
- Game development: Physics simulations and transformations
- Optimization: Gradient-based optimization algorithms
Practical Example: Linear Regression with nalgebra
use nalgebra::{DMatrix, DVector};
fn linear_regression(x: &DMatrix<f64>, y: &DVector<f64>) -> DVector<f64> {
// Add bias term (column of ones)
let mut x_with_bias = x.clone();
x_with_bias.insert_column(0, 1.0);
// Compute (X^T X)^-1 X^T y
let xtx = x_with_bias.transpose() * x_with_bias;
let xty = x_with_bias.transpose() * y;
// Solve using LU decomposition
let lu = xtx.lu();
let coefficients = lu.solve(&xty).expect("Failed to solve");
coefficients
}
fn main() {
// Create sample data: 100 samples, 3 features
let x = DMatrix::from_row_slice(100, 3, &vec![0.0; 300]);
let y = DVector::from_vec(vec![0.0; 100]);
// Fit linear regression
let coefficients = linear_regression(&x, &y);
println!("Coefficients: {}", coefficients);
// Compute predictions
let predictions = &x * &coefficients;
println!("Predictions shape: {}x{}", predictions.nrows(), predictions.ncols());
// Compute residuals
let residuals = &y - &predictions;
let mse = residuals.norm_squared() / residuals.len() as f64;
println!("Mean Squared Error: {}", mse);
}
Strengths
โ
High-Performance Linear Algebra: Optimized for matrix operations
โ
Type-Safe Matrices: Compile-time dimension guarantees prevent shape errors
โ
Comprehensive Decompositions: SVD, QR, LU, Cholesky, Eigenvalue
โ
Fixed-Size Matrices: Stack-allocated matrices for small dimensions
โ
Geometric Operations: Rotations, translations, projections built-in
โ
Numerical Stability: Well-tested algorithms for numerical stability
Limitations
โ Specialized for Linear Algebra: Not suitable for general data processing
โ Limited Sparse Matrix Support: Better for dense matrices
โ Smaller Feature Set: Focused on core linear algebra (no statistical functions)
โ Learning Curve: Requires understanding of linear algebra concepts
Performance Characteristics
nalgebra performance is excellent for linear algebra:
- Matrix multiplication: Comparable to BLAS libraries
- Decompositions: Highly optimized implementations
- Fixed-size matrices: Often faster than dynamic due to compile-time optimization
- Memory usage: Efficient, especially for fixed-size matrices
Detailed Comparison and Decision Matrix
Use Case Comparison
| Use Case | Best Choice | Why |
|---|---|---|
| Data loading/ETL | Polars | Optimized for tabular data, lazy evaluation |
| Feature engineering | Polars + ndarray | Polars for data manipulation, ndarray for math |
| Algorithm implementation | ndarray | General-purpose numerical computing |
| Linear regression | nalgebra | Optimized for matrix operations |
| Data normalization | ndarray | Efficient element-wise operations |
| Complex queries | Polars | Query optimization and lazy evaluation |
| Geometric transforms | nalgebra | Built-in rotation/translation support |
| Statistical analysis | ndarray + ndarray-stats | Rich statistical functions |
Performance Comparison (Approximate)
| Operation | Polars | ndarray | nalgebra |
|---|---|---|---|
| CSV loading (1GB) | 1x (fastest) | 5-10x | N/A |
| Filtering | 1x (fastest) | 3-5x | N/A |
| Matrix multiply (1000x1000) | N/A | 1x | 1x (similar) |
| Groupby aggregation | 1x (fastest) | 5-10x | N/A |
| Memory usage | Low | Medium | Low |
Real-World Example: Complete ML Pipeline
Here’s how you’d combine all three libraries for a production ML workflow:
use polars::prelude::*;
use ndarray::{Array2, s};
use nalgebra::DMatrix;
fn main() -> PolarsResult<()> {
// Step 1: Load and explore data with Polars
let df = CsvReader::from_path("training_data.csv")?
.infer_schema(None)
.finish()?;
// Step 2: Feature engineering with Polars
let features_df = df
.lazy()
.filter(col("target").is_not_null())
.with_columns([
(col("price") * col("quantity")).alias("total_value"),
col("price").rolling_mean(7).alias("price_ma7"),
])
.collect()?;
// Step 3: Convert to ndarray for normalization
let feature_matrix: Array2<f64> = features_df
.select([col("total_value"), col("price_ma7")])?
.to_ndarray::<Float64Type>(IndexOrder::RowMajor)?
.into();
// Normalize features
let mean = feature_matrix.mean_axis(ndarray::Axis(0)).unwrap();
let std = feature_matrix.std_axis(ndarray::Axis(0), 0.0);
let normalized = (feature_matrix - &mean) / &std;
// Step 4: Use nalgebra for linear regression
let x_nalgebra = DMatrix::from_row_slice(
normalized.nrows(),
normalized.ncols(),
normalized.as_slice().unwrap(),
);
// Get target variable
let targets = features_df.column("target")?;
let y_vec: Vec<f64> = targets.f64()?.into_iter().flatten().collect();
let y_nalgebra = nalgebra::DVector::from_vec(y_vec);
// Fit model (pseudo-code)
// let coefficients = fit_linear_regression(&x_nalgebra, &y_nalgebra);
println!("Pipeline complete!");
Ok(())
}
Integration with the Broader Rust ML Ecosystem
These three libraries integrate with other important tools:
- Tch-rs: PyTorch bindings for deep learning (works with all three)
- Burn: Native Rust deep learning framework (integrates with ndarray)
- Candle: Meta’s lightweight ML framework (alternative to tch-rs)
- Linfa: Scikit-learn-inspired ML algorithms (uses ndarray)
- Ort: ONNX Runtime for model inference
- Polars-SQL: SQL interface for Polars DataFrames
- Arrow: Apache Arrow for data interchange
Recommendations and Decision Framework
Choose Polars if:
โ
Your workflow involves tabular data manipulation
โ
You need fast data loading and aggregation
โ
You’re building ETL pipelines or feature engineering workflows
โ
You want lazy evaluation and query optimization
โ
You’re familiar with pandas and want a faster alternative
โ
You need to handle large datasets efficiently
Choose ndarray if:
โ
You’re implementing ML algorithms from scratch
โ
You need general-purpose numerical computing
โ
You want a NumPy-like interface
โ
You’re doing data transformations and normalizations
โ
You need to integrate with multiple libraries
โ
You’re familiar with NumPy and want a direct translation
Choose nalgebra if:
โ
You’re doing linear algebra-heavy work
โ
You need matrix decompositions or geometric transforms
โ
You value compile-time matrix dimension guarantees
โ
You’re building physics simulations or game engines
โ
You need high-performance linear system solving
โ
You want type-safe matrix operations
Hybrid Approach (Recommended for Production)
Most production ML systems benefit from using all three:
- Polars for data loading, cleaning, and feature engineering
- ndarray for data transformations and general numerical computing
- nalgebra for specialized linear algebra operations
This combination leverages each library’s strengths while maintaining clean separation of concerns.
Conclusion
Choosing the right Rust ML library depends on your specific use case, but the good news is that you don’t have to choose just one. The Rust ML ecosystem is designed for interoperability, allowing you to use Polars for data pipelines, ndarray for general computing, and nalgebra for specialized mathโall in the same project.
Key Takeaways
- Polars is your go-to for data processing and ETL workflows
- ndarray provides NumPy-like functionality for general numerical computing
- nalgebra excels at linear algebra and geometric computations
- These libraries integrate seamlessly for complete ML pipelines
- Rust’s type system and performance make it increasingly viable for production ML
As the Rust ML ecosystem continues to mature, expect even better integration, more pre-built models, and broader adoption in production systems. For now, understanding these three libraries positions you to build robust, high-performance ML systems in Rust.
Resources and Further Reading
Official Documentation
- Polars Documentation - Complete Polars guide
- ndarray Documentation - API reference and examples
- nalgebra Documentation - Linear algebra guide
- ndarray-linalg - Advanced linear algebra for ndarray
Learning Resources
- Linfa: Rust ML Algorithms - Scikit-learn-inspired algorithms
- Polars User Guide - Comprehensive tutorials
- ndarray-stats - Statistical functions
Community and Examples
- Rust ML Subreddit - Community discussions
- GitHub Examples - Real-world projects
- Awesome Rust - Curated ML resources
Comments