推理、记忆与推理时控制
突破级
暂无讲解视频
收录解读
Gated DeltaNet-2 improves linear attention by decoupling the erase and write operations that update the recurrent memory state.
The method generalizes earlier gated delta and Kimi Delta Attention variants with channel-wise erase and write gates, plus efficient chunkwise training and backward-pass machinery.
The reported results show strong long-context retrieval behavior and competitive language-modeling performance among recurrent and hybrid sequence models.
For this repository, the paper matters as a reusable memory-update primitive for efficient long-context modeling and non-softmax attention architectures.