Mixture of Depths: Dynamic Computation in Transformers
MoD dynamically adjusts computation per token, enabling 2-4x speedup in long-sequence processing. Learn how DeepSeek uses this technique for efficient inference.
MoD dynamically adjusts computation per token, enabling 2-4x speedup in long-sequence processing. Learn how DeepSeek uses this technique for efficient inference.
Master sparse attention algorithms that reduce Transformers quadratic complexity to linear, enabling efficient processing of long sequences in modern AI systems.