核心要点
- 问题/背景
- 这篇 arXiv 论文提出面向语言模型持续学习的 Sleep 范式,通过 Knowledge Seeding 把短期 in-context memories 蒸馏进长期参数能力,并通过 Dreaming 用 RL 生成合成 curriculum 进行自我改进。它不同于库里已有 R81 的 fast-weight/KV consolidation 论文,重点是 self-modification、memory consolidation...
原始摘要与中文对照
中文对照翻译
语言模型需要Sleep:学习自我修改和巩固记忆。过去几十年见证了机器学习算法设计方面的重大进展——从早期针对特定任务的浅层模型研究到更通用的深度大型语言模型(LLMs)。尽管现有模型在需要即时预测或上下文学习的任务中表现出有希望的结果,但它们缺乏持续学习并将其临时的上下文知识有效转移到其长期参数的能力。受人类学习过程的启发,我们引入了一种“Sleep”范式,它允许模型持续学习,通过重放将其短期的脆弱记忆提炼成稳定的长期知识,并通过“Dreaming”过程递归地改进自身。更详细地说,Sleep包括两个阶段:(1) Memory Consolidation:一个称为Knowledge Seeding的向上蒸馏过程,其中较小自我的记忆被蒸馏到更大的网络中,以提供更大的容量,同时保留知识。作为概念验证,我们提出了一种用于Knowledge Seeding的新型广义蒸馏过程(即,策略内蒸馏与基于强化学习(RL)的模仿学习的结合);(2) Dreaming:一个自我改进阶段,模型在此阶段使用RL生成合成数据课程,以排练新知识并完善现有能力,而无需人工监督。我们在长周期、持续学习、知识整合和少样本泛化任务上的实验支持了Sleep阶段的重要性。
原始摘要
The past few decades have witnessed significant advances in the design of machine learning algorithms–from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a “Sleep” paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with “Dreaming” process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for Knowledge Seeding (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.