Reflexion: Language Agents with Verbal Reinforcement Learning

Noah Shinn; Federico Cassano; Edward Berman; Ashwin Gopinath; Karthik Narasimhan; Shunyu Yao

智能体与自主科学颠覆级暂无讲解视频

发表时间: 2023-03-20
arXiv: 2303.11366

收录解读

Reflexion addresses a simple but foundational weakness in early language agents: they can act, but they do not reliably turn failure into reusable internal improvement. Instead of treating each attempt as stateless prompting, the paper frames agent behavior as an iterative loop in which the model performs a task, evaluates the outcome, and writes verbal reflections that condition the next attempt.

The paper’s core contribution is to externalize reinforcement into natural-language self-critique rather than gradient updates. This makes the adaptation mechanism cheap, inspectable, and broadly reusable across environments where scalar rewards or environment feedback exist but model weights are fixed. In practice, Reflexion made the self-feedback loop itself a first-class agent primitive.

This matters for the repository because a large share of later self-evolving, memory-augmented, and post-deployment agent work inherits this exact pattern: attempt, feedback, reflection, retry. Even when newer systems add memory routing, tool traces, or skill distillation, Reflexion remains one of the clearest early papers showing that language-space feedback can function like lightweight reinforcement for agents.

It is not ranked higher because the paper is still an early framework paper rather than a fully mature long-horizon agent system. Its evaluation scope is narrower than later computer-use and open-ended agent settings, and many later papers improve stability, transfer, and memory structure. But as a durable conceptual template, it clears the bar comfortably.

原始摘要与中文对照

中文对照翻译

大型语言模型（LLM）越来越多地被用作目标驱动的智能体，与外部环境（例如，游戏、编译器、API）进行交互。然而，对于这些语言智能体来说，快速有效地从试错中学习仍然具有挑战性，因为传统的强化学习方法需要大量的训练样本和昂贵的模型微调。我们提出了Reflexion，这是一个新颖的框架，它不是通过更新权重，而是通过语言反馈来强化语言智能体。具体来说，Reflexion智能体口头反思任务反馈信号，然后将自己的反思文本保存在一个情景记忆缓冲区中，以促使在随后的尝试中做出更好的决策。Reflexion足够灵活，可以整合各种类型（标量值或自由形式的语言）和来源（外部或内部模拟）的反馈信号，并在各种不同的任务（序列决策、编码、语言推理）中，相较于基线智能体取得了显著改进。例如，Reflexion在HumanEval编码基准测试中达到了91%的pass@1准确率，超越了之前最先进的GPT-4（其准确率为80%）。我们还使用不同的反馈信号、反馈整合方法和智能体类型进行了消融和分析研究，并提供了关于它们如何影响性能的见解。我们在https://github.com/noahshinn024/reflexion发布了所有代码、演示和数据集。

原始摘要

Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning. We propose Reflexion, a novel framework to reinforce language agents not by updating weights, but instead through linguistic feedback. Concretely, Reflexion agents verbally reflect on task feedback signals, then maintain their own reflective text in an episodic memory buffer to induce better decision-making in subsequent trials. Reflexion is flexible enough to incorporate various types (scalar values or free-form language) and sources (external or internally simulated) of feedback signals, and obtains significant improvements over a baseline agent across diverse tasks (sequential decision-making, coding, language reasoning). For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%. We also conduct ablation and analysis studies using different feedback signals, feedback incorporation methods, and agent types, and provide insights into how they affect performance. We release all code, demos, and datasets at https://github.com/noahshinn024/reflexion.

链接

论文链接

收录解读

原始摘要与中文对照

中文对照翻译

原始摘要

相关论文

链接