Rust for Machine Learning: Polars vs. ndarray vs. nalgebra

Introduction

Machine learning in Rust is experiencing rapid growth. The ecosystem has matured significantly, with libraries like Polars, ndarray, and nalgebra providing production-grade tools for data processing, numerical computing, and linear algebra. However, choosing the right library for your specific use case can be challenging, especially for developers transitioning from Python’s rich ML ecosystem.

This comprehensive guide compares three major Rust ML libraries, helping you make informed decisions about which tool best fits your project requirements. Whether you’re building data pipelines, implementing algorithms, or performing scientific computing, understanding the strengths and trade-offs of each library is essential for success.

Why Rust for Machine Learning?

Before diving into library comparisons, let’s understand why Rust is increasingly attractive for ML work:

Memory Safety: Rust’s ownership system eliminates entire categories of bugs (null pointers, buffer overflows, data races) without garbage collection overhead
Performance: Near-C/C++ speeds with zero-cost abstractions, making Rust ideal for computationally intensive ML workloads
Fearless Concurrency: Built-in parallelism primitives enable efficient multi-threaded data processing
Type Safety: Compile-time guarantees catch errors before they reach production
Deployment: Single binary deployment with no runtime dependencies simplifies production ML systems
Predictability: No GC pauses mean consistent latency for inference servers and real-time systems

The Three Libraries at a Glance

Feature	Polars	ndarray	nalgebra
Primary Use	DataFrames & ETL	NumPy-like arrays	Linear algebra
Data Type	Tabular (rows/columns)	N-dimensional arrays	Matrices/vectors
Performance	Blazingly fast	Very good	Excellent
Ease of Use	High (pandas-like)	Medium (NumPy-like)	Medium (specialized)
Ecosystem	Growing rapidly	Mature	Mature
GPU Support	Limited (experimental)	Via external libs	Via external libs
Lazy Evaluation	Yes (query optimization)	No	No
Memory Efficiency	Excellent	Good	Good
Sparse Data	Good	Good	Limited
Best For	Data pipelines	General computing	Math-heavy work

Polars: The DataFrame Powerhouse

Overview and Design Philosophy

Polars is a modern DataFrame library written entirely in Rust, designed from the ground up for speed and efficiency. Unlike pandas (which evolved from older code), Polars was built with performance as a first-class concern, leveraging Rust’s memory model and parallelism capabilities.

Key Design Principles:

Lazy evaluation: Define computation graphs that are optimized before execution
Query optimization: Automatically reorders operations for efficiency
Parallel execution: Multi-threaded operations by default
Memory efficiency: Zero-copy operations where possible

When to Use Polars

Polars excels in scenarios involving tabular data manipulation:

Data loading and exploration: Read CSV, Parquet, JSON, or database files efficiently
ETL pipelines: Transform, filter, aggregate, and join data at scale
Feature engineering: Create and manipulate features for ML models
Data analysis: Groupby operations, window functions, and complex aggregations
Lazy evaluation: Define complex queries that are optimized before execution
Production pipelines: Handle large datasets with minimal memory overhead

Practical Example: Feature Engineering Pipeline

use polars::prelude::*;

fn main() -> PolarsResult<()> {
    // Read data
    let df = CsvReader::from_path("sales_data.csv")?
        .infer_schema(None)
        .finish()?;

    // Complex feature engineering with lazy evaluation
    let features = df
        .lazy()
        // Filter for recent data
        .filter(col("date").gt(lit("2024-01-01")))
        // Create new features
        .with_columns([
            (col("price") * col("quantity")).alias("total_value"),
            col("price").rolling_mean(7).alias("price_ma7"),
        ])
        // Group and aggregate
        .groupby([col("customer_id")])
        .agg([
            col("total_value").sum().alias("total_spent"),
            col("quantity").mean().alias("avg_quantity"),
            col("price").std().alias("price_std"),
        ])
        // Sort by total spent
        .sort("total_spent", Default::default())
        .collect()?;

    println!("{}", features);
    Ok(())
}

Strengths

✅ Blazing Performance: Optimized query engine with parallel execution
✅ Lazy Evaluation: Only computes what you need, optimizing query plans
✅ Rich API: Comprehensive DataFrame operations (joins, groupby, window functions)
✅ Multiple Formats: Native support for CSV, Parquet, JSON, and database connectors
✅ Memory Efficient: Zero-copy operations and efficient memory management
✅ Familiar Syntax: Similar to pandas for easy transition from Python

Limitations

❌ Specialized for DataFrames: Not suitable for general N-dimensional array operations
❌ Limited Math Operations: Lacks advanced mathematical functions (use with ndarray/nalgebra)
❌ Smaller Ecosystem: Fewer third-party integrations compared to pandas
❌ Learning Curve: Lazy evaluation requires different thinking than eager evaluation

Performance Characteristics

Polars typically outperforms pandas by 5-50x depending on the operation:

CSV loading: 10-20x faster
Filtering: 5-15x faster
Groupby operations: 10-30x faster
Joins: 5-20x faster

ndarray: The NumPy Alternative

Overview and Design Philosophy

ndarray is Rust’s primary answer to NumPy, providing N-dimensional arrays with broadcasting, slicing, and mathematical operations. It’s designed to be familiar to NumPy users while leveraging Rust’s performance and safety guarantees.

Key Design Principles:

NumPy compatibility: Familiar API for Python developers
Broadcasting: Automatic shape alignment for operations
Slicing and indexing: Powerful array manipulation
Integration: Works seamlessly with other Rust libraries

When to Use ndarray

ndarray is ideal for numerical computing tasks:

Numerical computing: Matrix operations, element-wise computations, reductions
Algorithm implementation: Build custom ML algorithms from scratch
Scientific computing: General-purpose numerical work
Bridge from Python: Familiar API for NumPy/SciPy users
Integration: Works well with Polars, nalgebra, and other Rust libraries
Data transformations: Normalize, scale, and transform numerical data

Practical Example: Data Normalization and Transformation

use ndarray::{Array2, Array1, s};
use ndarray_stats::QuantileExt;

fn normalize_features(data: &Array2<f64>) -> Array2<f64> {
    // Compute mean and standard deviation along axis 0 (columns)
    let mean = data.mean_axis(ndarray::Axis(0)).unwrap();
    let std = data.std_axis(ndarray::Axis(0), 0.0);

    // Normalize: (x - mean) / std
    let normalized = (data - &mean) / &std;
    normalized
}

fn main() {
    // Create sample feature matrix (100 samples, 5 features)
    let data = Array2::<f64>::zeros((100, 5));

    // Normalize features
    let normalized = normalize_features(&data);

    // Compute correlation matrix
    let correlation = normalized.t().dot(&normalized) / 100.0;
    println!("Correlation matrix:\n{}", correlation);

    // Slicing and indexing
    let first_10_samples = &normalized.slice(s![0..10, ..]);
    println!("First 10 samples:\n{}", first_10_samples);

    // Element-wise operations
    let squared = &normalized * &normalized;
    let sum_of_squares = squared.sum_axis(ndarray::Axis(1));
    println!("Sum of squares per sample:\n{}", sum_of_squares);
}

Strengths

✅ NumPy-like Syntax: Familiar for data scientists transitioning from Python
✅ Rich Ecosystem: ndarray-linalg, ndarray-stats for advanced operations
✅ Broadcasting and Slicing: Intuitive array operations with powerful indexing
✅ Integration-Friendly: Works seamlessly with Polars, nalgebra, and other libraries
✅ Type Safety: Compile-time dimension checking prevents shape errors
✅ Performance: Competitive with NumPy for most operations

Limitations

❌ Eager Evaluation: No lazy evaluation or query optimization
❌ Steeper Learning Curve: More complex than Polars for simple operations
❌ GPU Support: Requires external crates (tch-rs, burn) for GPU acceleration
❌ Memory Overhead: Larger memory footprint than specialized libraries for specific tasks

Performance Characteristics

ndarray performance is comparable to NumPy:

Matrix multiplication: Similar to NumPy (BLAS-optimized)
Element-wise operations: Slightly faster than NumPy due to Rust optimizations
Reductions: Comparable to NumPy
Memory usage: Similar to NumPy

nalgebra: The Linear Algebra Specialist

Overview and Design Philosophy

nalgebra is a pure, high-performance linear algebra library focused on correctness and efficiency. It provides both fixed-size (compile-time known dimensions) and dynamic-size matrices, with comprehensive decomposition support.

Key Design Principles:

Type Safety: Compile-time matrix dimension guarantees
Performance: Optimized for linear algebra operations
Flexibility: Both fixed and dynamic-size matrices
Completeness: Comprehensive decomposition and solver support

When to Use nalgebra

nalgebra is essential for linear algebra-heavy work:

Matrix decompositions: SVD, QR, LU, Cholesky, Eigenvalue decompositions
Linear system solving: Solve Ax = b efficiently
Geometric computations: Rotations, translations, projections
Scientific computing: Eigenvalue problems, least squares fitting
Game development: Physics simulations and transformations
Optimization: Gradient-based optimization algorithms

Practical Example: Linear Regression with nalgebra

use nalgebra::{DMatrix, DVector};

fn linear_regression(x: &DMatrix<f64>, y: &DVector<f64>) -> DVector<f64> {
    // Add bias term (column of ones)
    let mut x_with_bias = x.clone();
    x_with_bias.insert_column(0, 1.0);

    // Compute (X^T X)^-1 X^T y
    let xtx = x_with_bias.transpose() * x_with_bias;
    let xty = x_with_bias.transpose() * y;

    // Solve using LU decomposition
    let lu = xtx.lu();
    let coefficients = lu.solve(&xty).expect("Failed to solve");

    coefficients
}

fn main() {
    // Create sample data: 100 samples, 3 features
    let x = DMatrix::from_row_slice(100, 3, &vec![0.0; 300]);
    let y = DVector::from_vec(vec![0.0; 100]);

    // Fit linear regression
    let coefficients = linear_regression(&x, &y);
    println!("Coefficients: {}", coefficients);

    // Compute predictions
    let predictions = &x * &coefficients;
    println!("Predictions shape: {}x{}", predictions.nrows(), predictions.ncols());

    // Compute residuals
    let residuals = &y - &predictions;
    let mse = residuals.norm_squared() / residuals.len() as f64;
    println!("Mean Squared Error: {}", mse);
}

Strengths

✅ High-Performance Linear Algebra: Optimized for matrix operations
✅ Type-Safe Matrices: Compile-time dimension guarantees prevent shape errors
✅ Comprehensive Decompositions: SVD, QR, LU, Cholesky, Eigenvalue
✅ Fixed-Size Matrices: Stack-allocated matrices for small dimensions
✅ Geometric Operations: Rotations, translations, projections built-in
✅ Numerical Stability: Well-tested algorithms for numerical stability

Limitations

❌ Specialized for Linear Algebra: Not suitable for general data processing
❌ Limited Sparse Matrix Support: Better for dense matrices
❌ Smaller Feature Set: Focused on core linear algebra (no statistical functions)
❌ Learning Curve: Requires understanding of linear algebra concepts

Performance Characteristics

nalgebra performance is excellent for linear algebra:

Matrix multiplication: Comparable to BLAS libraries
Decompositions: Highly optimized implementations
Fixed-size matrices: Often faster than dynamic due to compile-time optimization
Memory usage: Efficient, especially for fixed-size matrices

Detailed Comparison and Decision Matrix

Use Case Comparison

Use Case	Best Choice	Why
Data loading/ETL	Polars	Optimized for tabular data, lazy evaluation
Feature engineering	Polars + ndarray	Polars for data manipulation, ndarray for math
Algorithm implementation	ndarray	General-purpose numerical computing
Linear regression	nalgebra	Optimized for matrix operations
Data normalization	ndarray	Efficient element-wise operations
Complex queries	Polars	Query optimization and lazy evaluation
Geometric transforms	nalgebra	Built-in rotation/translation support
Statistical analysis	ndarray + ndarray-stats	Rich statistical functions

Performance Comparison (Approximate)

Operation	Polars	ndarray	nalgebra
CSV loading (1GB)	1x (fastest)	5-10x	N/A
Filtering	1x (fastest)	3-5x	N/A
Matrix multiply (1000x1000)	N/A	1x	1x (similar)
Groupby aggregation	1x (fastest)	5-10x	N/A
Memory usage	Low	Medium	Low

Real-World Example: Complete ML Pipeline

Here’s how you’d combine all three libraries for a production ML workflow:

use polars::prelude::*;
use ndarray::{Array2, s};
use nalgebra::DMatrix;

fn main() -> PolarsResult<()> {
    // Step 1: Load and explore data with Polars
    let df = CsvReader::from_path("training_data.csv")?
        .infer_schema(None)
        .finish()?;

    // Step 2: Feature engineering with Polars
    let features_df = df
        .lazy()
        .filter(col("target").is_not_null())
        .with_columns([
            (col("price") * col("quantity")).alias("total_value"),
            col("price").rolling_mean(7).alias("price_ma7"),
        ])
        .collect()?;

    // Step 3: Convert to ndarray for normalization
    let feature_matrix: Array2<f64> = features_df
        .select([col("total_value"), col("price_ma7")])?
        .to_ndarray::<Float64Type>(IndexOrder::RowMajor)?
        .into();

    // Normalize features
    let mean = feature_matrix.mean_axis(ndarray::Axis(0)).unwrap();
    let std = feature_matrix.std_axis(ndarray::Axis(0), 0.0);
    let normalized = (feature_matrix - &mean) / &std;

    // Step 4: Use nalgebra for linear regression
    let x_nalgebra = DMatrix::from_row_slice(
        normalized.nrows(),
        normalized.ncols(),
        normalized.as_slice().unwrap(),
    );

    // Get target variable
    let targets = features_df.column("target")?;
    let y_vec: Vec<f64> = targets.f64()?.into_iter().flatten().collect();
    let y_nalgebra = nalgebra::DVector::from_vec(y_vec);

    // Fit model (pseudo-code)
    // let coefficients = fit_linear_regression(&x_nalgebra, &y_nalgebra);

    println!("Pipeline complete!");
    Ok(())
}

Integration with the Broader Rust ML Ecosystem

These three libraries integrate with other important tools:

Tch-rs: PyTorch bindings for deep learning (works with all three)
Burn: Native Rust deep learning framework (integrates with ndarray)
Candle: Meta’s lightweight ML framework (alternative to tch-rs)
Linfa: Scikit-learn-inspired ML algorithms (uses ndarray)
Ort: ONNX Runtime for model inference
Polars-SQL: SQL interface for Polars DataFrames
Arrow: Apache Arrow for data interchange

Recommendations and Decision Framework

Choose Polars if:

✅ Your workflow involves tabular data manipulation
✅ You need fast data loading and aggregation
✅ You’re building ETL pipelines or feature engineering workflows
✅ You want lazy evaluation and query optimization
✅ You’re familiar with pandas and want a faster alternative
✅ You need to handle large datasets efficiently

Choose ndarray if:

✅ You’re implementing ML algorithms from scratch
✅ You need general-purpose numerical computing
✅ You want a NumPy-like interface
✅ You’re doing data transformations and normalizations
✅ You need to integrate with multiple libraries
✅ You’re familiar with NumPy and want a direct translation

Choose nalgebra if:

✅ You’re doing linear algebra-heavy work
✅ You need matrix decompositions or geometric transforms
✅ You value compile-time matrix dimension guarantees
✅ You’re building physics simulations or game engines
✅ You need high-performance linear system solving
✅ You want type-safe matrix operations

Hybrid Approach (Recommended for Production)

Most production ML systems benefit from using all three:

Polars for data loading, cleaning, and feature engineering
ndarray for data transformations and general numerical computing
nalgebra for specialized linear algebra operations

This combination leverages each library’s strengths while maintaining clean separation of concerns.

Conclusion

Choosing the right Rust ML library depends on your specific use case, but the good news is that you don’t have to choose just one. The Rust ML ecosystem is designed for interoperability, allowing you to use Polars for data pipelines, ndarray for general computing, and nalgebra for specialized math—all in the same project.

Key Takeaways

Polars is your go-to for data processing and ETL workflows
ndarray provides NumPy-like functionality for general numerical computing
nalgebra excels at linear algebra and geometric computations
These libraries integrate seamlessly for complete ML pipelines
Rust’s type system and performance make it increasingly viable for production ML

As the Rust ML ecosystem continues to mature, expect even better integration, more pre-built models, and broader adoption in production systems. For now, understanding these three libraries positions you to build robust, high-performance ML systems in Rust.

Resources and Further Reading

Official Documentation

Polars Documentation - Complete Polars guide
ndarray Documentation - API reference and examples
nalgebra Documentation - Linear algebra guide
ndarray-linalg - Advanced linear algebra for ndarray

Learning Resources

Linfa: Rust ML Algorithms - Scikit-learn-inspired algorithms
Polars User Guide - Comprehensive tutorials
ndarray-stats - Statistical functions

Community and Examples

Rust ML Subreddit - Community discussions
GitHub Examples - Real-world projects
Awesome Rust - Curated ML resources

Introduction

Why Rust for Machine Learning?

The Three Libraries at a Glance

Polars: The DataFrame Powerhouse

Overview and Design Philosophy

When to Use Polars

Practical Example: Feature Engineering Pipeline

Strengths

Limitations

Performance Characteristics

ndarray: The NumPy Alternative

Overview and Design Philosophy

When to Use ndarray

Practical Example: Data Normalization and Transformation

Strengths

Limitations

Performance Characteristics

nalgebra: The Linear Algebra Specialist

Overview and Design Philosophy

When to Use nalgebra

Practical Example: Linear Regression with nalgebra

Strengths

Limitations

Performance Characteristics

Detailed Comparison and Decision Matrix

Use Case Comparison

Performance Comparison (Approximate)

Real-World Example: Complete ML Pipeline

Integration with the Broader Rust ML Ecosystem

Recommendations and Decision Framework

Choose Polars if:

Choose ndarray if:

Choose nalgebra if:

Hybrid Approach (Recommended for Production)

Conclusion

Key Takeaways

Resources and Further Reading

Official Documentation

Learning Resources

Community and Examples

Comments