Mixture of Depths: Dynamic Computation in Transformers MoD dynamically adjusts computation per token, enabling 2-4x speedup in long-sequence processing. Learn how DeepSeek uses this technique for efficient inference. 2026-03-19