Gated Linear Attention: Efficient Transformers with Data-Dependent Gating GLA combines linear attention efficiency with learned gating for expressivity. Learn how it achieves RNN-like inference with transformer-like training. 2026-03-19