推理、记忆与推理时控制 突破级 暂无讲解视频
发表时间
2026-05-21
arXiv
2605.22791

收录解读

Gated DeltaNet-2 improves linear attention by decoupling the erase and write operations that update the recurrent memory state.

The method generalizes earlier gated delta and Kimi Delta Attention variants with channel-wise erase and write gates, plus efficient chunkwise training and backward-pass machinery.

The reported results show strong long-context retrieval behavior and competitive language-modeling performance among recurrent and hybrid sequence models.

For this repository, the paper matters as a reusable memory-update primitive for efficient long-context modeling and non-softmax attention architectures.

链接