State Space Models

Gated Linear Attention: Efficient Transformers with Data-Dependent Gating

GLA combines linear attention efficiency with learned gating for expressivity. Learn how it achieves RNN-like inference with transformer-like training.

2026-03-19

State Space Models: Mamba and the Post-Transformer Architecture

Mamba-3 achieves 4% better performance than Transformers with 7x faster inference. Learn SSM foundations, selective mechanisms, and hybrid architectures for efficient inference.

2026-03-19

State Space Models (SSM) and Mamba: The Post-Transformer Architecture

Explore state space models and Mamba architecture—a linear-time sequence modeling approach that challenges Transformers with efficient long-range dependency handling.

2026-03-16