Skip to main content
โšก Calmops

Rust for Machine Learning: Polars vs. ndarray vs. nalgebra

Choosing the Right Data Processing and Linear Algebra Library for Your ML Workflows

Introduction

Machine learning in Rust is experiencing rapid growth. The ecosystem has matured significantly, with libraries like Polars, ndarray, and nalgebra providing production-grade tools for data processing, numerical computing, and linear algebra. However, choosing the right library for your specific use case can be challenging, especially for developers transitioning from Python’s rich ML ecosystem.

This comprehensive guide compares three major Rust ML libraries, helping you make informed decisions about which tool best fits your project requirements. Whether you’re building data pipelines, implementing algorithms, or performing scientific computing, understanding the strengths and trade-offs of each library is essential for success.

Why Rust for Machine Learning?

Before diving into library comparisons, let’s understand why Rust is increasingly attractive for ML work:

  • Memory Safety: Rust’s ownership system eliminates entire categories of bugs (null pointers, buffer overflows, data races) without garbage collection overhead
  • Performance: Near-C/C++ speeds with zero-cost abstractions, making Rust ideal for computationally intensive ML workloads
  • Fearless Concurrency: Built-in parallelism primitives enable efficient multi-threaded data processing
  • Type Safety: Compile-time guarantees catch errors before they reach production
  • Deployment: Single binary deployment with no runtime dependencies simplifies production ML systems
  • Predictability: No GC pauses mean consistent latency for inference servers and real-time systems

The Three Libraries at a Glance

Feature Polars ndarray nalgebra
Primary Use DataFrames & ETL NumPy-like arrays Linear algebra
Data Type Tabular (rows/columns) N-dimensional arrays Matrices/vectors
Performance Blazingly fast Very good Excellent
Ease of Use High (pandas-like) Medium (NumPy-like) Medium (specialized)
Ecosystem Growing rapidly Mature Mature
GPU Support Limited (experimental) Via external libs Via external libs
Lazy Evaluation Yes (query optimization) No No
Memory Efficiency Excellent Good Good
Sparse Data Good Good Limited
Best For Data pipelines General computing Math-heavy work

Polars: The DataFrame Powerhouse

Overview and Design Philosophy

Polars is a modern DataFrame library written entirely in Rust, designed from the ground up for speed and efficiency. Unlike pandas (which evolved from older code), Polars was built with performance as a first-class concern, leveraging Rust’s memory model and parallelism capabilities.

Key Design Principles:

  • Lazy evaluation: Define computation graphs that are optimized before execution
  • Query optimization: Automatically reorders operations for efficiency
  • Parallel execution: Multi-threaded operations by default
  • Memory efficiency: Zero-copy operations where possible

When to Use Polars

Polars excels in scenarios involving tabular data manipulation:

  • Data loading and exploration: Read CSV, Parquet, JSON, or database files efficiently
  • ETL pipelines: Transform, filter, aggregate, and join data at scale
  • Feature engineering: Create and manipulate features for ML models
  • Data analysis: Groupby operations, window functions, and complex aggregations
  • Lazy evaluation: Define complex queries that are optimized before execution
  • Production pipelines: Handle large datasets with minimal memory overhead

Practical Example: Feature Engineering Pipeline

use polars::prelude::*;

fn main() -> PolarsResult<()> {
    // Read data
    let df = CsvReader::from_path("sales_data.csv")?
        .infer_schema(None)
        .finish()?;

    // Complex feature engineering with lazy evaluation
    let features = df
        .lazy()
        // Filter for recent data
        .filter(col("date").gt(lit("2024-01-01")))
        // Create new features
        .with_columns([
            (col("price") * col("quantity")).alias("total_value"),
            col("price").rolling_mean(7).alias("price_ma7"),
        ])
        // Group and aggregate
        .groupby([col("customer_id")])
        .agg([
            col("total_value").sum().alias("total_spent"),
            col("quantity").mean().alias("avg_quantity"),
            col("price").std().alias("price_std"),
        ])
        // Sort by total spent
        .sort("total_spent", Default::default())
        .collect()?;

    println!("{}", features);
    Ok(())
}

Strengths

โœ… Blazing Performance: Optimized query engine with parallel execution
โœ… Lazy Evaluation: Only computes what you need, optimizing query plans
โœ… Rich API: Comprehensive DataFrame operations (joins, groupby, window functions)
โœ… Multiple Formats: Native support for CSV, Parquet, JSON, and database connectors
โœ… Memory Efficient: Zero-copy operations and efficient memory management
โœ… Familiar Syntax: Similar to pandas for easy transition from Python

Limitations

โŒ Specialized for DataFrames: Not suitable for general N-dimensional array operations
โŒ Limited Math Operations: Lacks advanced mathematical functions (use with ndarray/nalgebra)
โŒ Smaller Ecosystem: Fewer third-party integrations compared to pandas
โŒ Learning Curve: Lazy evaluation requires different thinking than eager evaluation

Performance Characteristics

Polars typically outperforms pandas by 5-50x depending on the operation:

  • CSV loading: 10-20x faster
  • Filtering: 5-15x faster
  • Groupby operations: 10-30x faster
  • Joins: 5-20x faster

ndarray: The NumPy Alternative

Overview and Design Philosophy

ndarray is Rust’s primary answer to NumPy, providing N-dimensional arrays with broadcasting, slicing, and mathematical operations. It’s designed to be familiar to NumPy users while leveraging Rust’s performance and safety guarantees.

Key Design Principles:

  • NumPy compatibility: Familiar API for Python developers
  • Broadcasting: Automatic shape alignment for operations
  • Slicing and indexing: Powerful array manipulation
  • Integration: Works seamlessly with other Rust libraries

When to Use ndarray

ndarray is ideal for numerical computing tasks:

  • Numerical computing: Matrix operations, element-wise computations, reductions
  • Algorithm implementation: Build custom ML algorithms from scratch
  • Scientific computing: General-purpose numerical work
  • Bridge from Python: Familiar API for NumPy/SciPy users
  • Integration: Works well with Polars, nalgebra, and other Rust libraries
  • Data transformations: Normalize, scale, and transform numerical data

Practical Example: Data Normalization and Transformation

use ndarray::{Array2, Array1, s};
use ndarray_stats::QuantileExt;

fn normalize_features(data: &Array2<f64>) -> Array2<f64> {
    // Compute mean and standard deviation along axis 0 (columns)
    let mean = data.mean_axis(ndarray::Axis(0)).unwrap();
    let std = data.std_axis(ndarray::Axis(0), 0.0);

    // Normalize: (x - mean) / std
    let normalized = (data - &mean) / &std;
    normalized
}

fn main() {
    // Create sample feature matrix (100 samples, 5 features)
    let data = Array2::<f64>::zeros((100, 5));

    // Normalize features
    let normalized = normalize_features(&data);

    // Compute correlation matrix
    let correlation = normalized.t().dot(&normalized) / 100.0;
    println!("Correlation matrix:\n{}", correlation);

    // Slicing and indexing
    let first_10_samples = &normalized.slice(s![0..10, ..]);
    println!("First 10 samples:\n{}", first_10_samples);

    // Element-wise operations
    let squared = &normalized * &normalized;
    let sum_of_squares = squared.sum_axis(ndarray::Axis(1));
    println!("Sum of squares per sample:\n{}", sum_of_squares);
}

Strengths

โœ… NumPy-like Syntax: Familiar for data scientists transitioning from Python
โœ… Rich Ecosystem: ndarray-linalg, ndarray-stats for advanced operations
โœ… Broadcasting and Slicing: Intuitive array operations with powerful indexing
โœ… Integration-Friendly: Works seamlessly with Polars, nalgebra, and other libraries
โœ… Type Safety: Compile-time dimension checking prevents shape errors
โœ… Performance: Competitive with NumPy for most operations

Limitations

โŒ Eager Evaluation: No lazy evaluation or query optimization
โŒ Steeper Learning Curve: More complex than Polars for simple operations
โŒ GPU Support: Requires external crates (tch-rs, burn) for GPU acceleration
โŒ Memory Overhead: Larger memory footprint than specialized libraries for specific tasks

Performance Characteristics

ndarray performance is comparable to NumPy:

  • Matrix multiplication: Similar to NumPy (BLAS-optimized)
  • Element-wise operations: Slightly faster than NumPy due to Rust optimizations
  • Reductions: Comparable to NumPy
  • Memory usage: Similar to NumPy

nalgebra: The Linear Algebra Specialist

Overview and Design Philosophy

nalgebra is a pure, high-performance linear algebra library focused on correctness and efficiency. It provides both fixed-size (compile-time known dimensions) and dynamic-size matrices, with comprehensive decomposition support.

Key Design Principles:

  • Type Safety: Compile-time matrix dimension guarantees
  • Performance: Optimized for linear algebra operations
  • Flexibility: Both fixed and dynamic-size matrices
  • Completeness: Comprehensive decomposition and solver support

When to Use nalgebra

nalgebra is essential for linear algebra-heavy work:

  • Matrix decompositions: SVD, QR, LU, Cholesky, Eigenvalue decompositions
  • Linear system solving: Solve Ax = b efficiently
  • Geometric computations: Rotations, translations, projections
  • Scientific computing: Eigenvalue problems, least squares fitting
  • Game development: Physics simulations and transformations
  • Optimization: Gradient-based optimization algorithms

Practical Example: Linear Regression with nalgebra

use nalgebra::{DMatrix, DVector};

fn linear_regression(x: &DMatrix<f64>, y: &DVector<f64>) -> DVector<f64> {
    // Add bias term (column of ones)
    let mut x_with_bias = x.clone();
    x_with_bias.insert_column(0, 1.0);

    // Compute (X^T X)^-1 X^T y
    let xtx = x_with_bias.transpose() * x_with_bias;
    let xty = x_with_bias.transpose() * y;

    // Solve using LU decomposition
    let lu = xtx.lu();
    let coefficients = lu.solve(&xty).expect("Failed to solve");

    coefficients
}

fn main() {
    // Create sample data: 100 samples, 3 features
    let x = DMatrix::from_row_slice(100, 3, &vec![0.0; 300]);
    let y = DVector::from_vec(vec![0.0; 100]);

    // Fit linear regression
    let coefficients = linear_regression(&x, &y);
    println!("Coefficients: {}", coefficients);

    // Compute predictions
    let predictions = &x * &coefficients;
    println!("Predictions shape: {}x{}", predictions.nrows(), predictions.ncols());

    // Compute residuals
    let residuals = &y - &predictions;
    let mse = residuals.norm_squared() / residuals.len() as f64;
    println!("Mean Squared Error: {}", mse);
}

Strengths

โœ… High-Performance Linear Algebra: Optimized for matrix operations
โœ… Type-Safe Matrices: Compile-time dimension guarantees prevent shape errors
โœ… Comprehensive Decompositions: SVD, QR, LU, Cholesky, Eigenvalue
โœ… Fixed-Size Matrices: Stack-allocated matrices for small dimensions
โœ… Geometric Operations: Rotations, translations, projections built-in
โœ… Numerical Stability: Well-tested algorithms for numerical stability

Limitations

โŒ Specialized for Linear Algebra: Not suitable for general data processing
โŒ Limited Sparse Matrix Support: Better for dense matrices
โŒ Smaller Feature Set: Focused on core linear algebra (no statistical functions)
โŒ Learning Curve: Requires understanding of linear algebra concepts

Performance Characteristics

nalgebra performance is excellent for linear algebra:

  • Matrix multiplication: Comparable to BLAS libraries
  • Decompositions: Highly optimized implementations
  • Fixed-size matrices: Often faster than dynamic due to compile-time optimization
  • Memory usage: Efficient, especially for fixed-size matrices

Detailed Comparison and Decision Matrix

Use Case Comparison

Use Case Best Choice Why
Data loading/ETL Polars Optimized for tabular data, lazy evaluation
Feature engineering Polars + ndarray Polars for data manipulation, ndarray for math
Algorithm implementation ndarray General-purpose numerical computing
Linear regression nalgebra Optimized for matrix operations
Data normalization ndarray Efficient element-wise operations
Complex queries Polars Query optimization and lazy evaluation
Geometric transforms nalgebra Built-in rotation/translation support
Statistical analysis ndarray + ndarray-stats Rich statistical functions

Performance Comparison (Approximate)

Operation Polars ndarray nalgebra
CSV loading (1GB) 1x (fastest) 5-10x N/A
Filtering 1x (fastest) 3-5x N/A
Matrix multiply (1000x1000) N/A 1x 1x (similar)
Groupby aggregation 1x (fastest) 5-10x N/A
Memory usage Low Medium Low

Real-World Example: Complete ML Pipeline

Here’s how you’d combine all three libraries for a production ML workflow:

use polars::prelude::*;
use ndarray::{Array2, s};
use nalgebra::DMatrix;

fn main() -> PolarsResult<()> {
    // Step 1: Load and explore data with Polars
    let df = CsvReader::from_path("training_data.csv")?
        .infer_schema(None)
        .finish()?;

    // Step 2: Feature engineering with Polars
    let features_df = df
        .lazy()
        .filter(col("target").is_not_null())
        .with_columns([
            (col("price") * col("quantity")).alias("total_value"),
            col("price").rolling_mean(7).alias("price_ma7"),
        ])
        .collect()?;

    // Step 3: Convert to ndarray for normalization
    let feature_matrix: Array2<f64> = features_df
        .select([col("total_value"), col("price_ma7")])?
        .to_ndarray::<Float64Type>(IndexOrder::RowMajor)?
        .into();

    // Normalize features
    let mean = feature_matrix.mean_axis(ndarray::Axis(0)).unwrap();
    let std = feature_matrix.std_axis(ndarray::Axis(0), 0.0);
    let normalized = (feature_matrix - &mean) / &std;

    // Step 4: Use nalgebra for linear regression
    let x_nalgebra = DMatrix::from_row_slice(
        normalized.nrows(),
        normalized.ncols(),
        normalized.as_slice().unwrap(),
    );

    // Get target variable
    let targets = features_df.column("target")?;
    let y_vec: Vec<f64> = targets.f64()?.into_iter().flatten().collect();
    let y_nalgebra = nalgebra::DVector::from_vec(y_vec);

    // Fit model (pseudo-code)
    // let coefficients = fit_linear_regression(&x_nalgebra, &y_nalgebra);

    println!("Pipeline complete!");
    Ok(())
}

Integration with the Broader Rust ML Ecosystem

These three libraries integrate with other important tools:

  • Tch-rs: PyTorch bindings for deep learning (works with all three)
  • Burn: Native Rust deep learning framework (integrates with ndarray)
  • Candle: Meta’s lightweight ML framework (alternative to tch-rs)
  • Linfa: Scikit-learn-inspired ML algorithms (uses ndarray)
  • Ort: ONNX Runtime for model inference
  • Polars-SQL: SQL interface for Polars DataFrames
  • Arrow: Apache Arrow for data interchange

Recommendations and Decision Framework

Choose Polars if:

โœ… Your workflow involves tabular data manipulation
โœ… You need fast data loading and aggregation
โœ… You’re building ETL pipelines or feature engineering workflows
โœ… You want lazy evaluation and query optimization
โœ… You’re familiar with pandas and want a faster alternative
โœ… You need to handle large datasets efficiently

Choose ndarray if:

โœ… You’re implementing ML algorithms from scratch
โœ… You need general-purpose numerical computing
โœ… You want a NumPy-like interface
โœ… You’re doing data transformations and normalizations
โœ… You need to integrate with multiple libraries
โœ… You’re familiar with NumPy and want a direct translation

Choose nalgebra if:

โœ… You’re doing linear algebra-heavy work
โœ… You need matrix decompositions or geometric transforms
โœ… You value compile-time matrix dimension guarantees
โœ… You’re building physics simulations or game engines
โœ… You need high-performance linear system solving
โœ… You want type-safe matrix operations

Most production ML systems benefit from using all three:

  1. Polars for data loading, cleaning, and feature engineering
  2. ndarray for data transformations and general numerical computing
  3. nalgebra for specialized linear algebra operations

This combination leverages each library’s strengths while maintaining clean separation of concerns.


Conclusion

Choosing the right Rust ML library depends on your specific use case, but the good news is that you don’t have to choose just one. The Rust ML ecosystem is designed for interoperability, allowing you to use Polars for data pipelines, ndarray for general computing, and nalgebra for specialized mathโ€”all in the same project.

Key Takeaways

  • Polars is your go-to for data processing and ETL workflows
  • ndarray provides NumPy-like functionality for general numerical computing
  • nalgebra excels at linear algebra and geometric computations
  • These libraries integrate seamlessly for complete ML pipelines
  • Rust’s type system and performance make it increasingly viable for production ML

As the Rust ML ecosystem continues to mature, expect even better integration, more pre-built models, and broader adoption in production systems. For now, understanding these three libraries positions you to build robust, high-performance ML systems in Rust.


Resources and Further Reading

Official Documentation

Learning Resources

Community and Examples

Comments