全部论文索引，第 13 页

TextGrad: Automatic "Differentiation" via Text

发表：2024-06-11 · 收录：未知 · 智能体与自主科学

TextGrad takes a useful systems idea and makes it explicit: if many LLM pipelines are made of textual intermediate states, then optimization can also happen...

Certifiably Robust RAG against Retrieval Corruption

发表：2024-05-24 · 收录：未知 · 推理、记忆与推理时控制

RAG 系统的一个核心脆弱点是 retrieval corruption：攻击者只要向检索结果里注入恶意文段，就可能把最终回答拉偏。大多数现有防御依赖启发式过滤、重排序或 prompt-level 规避，缺少能够对攻击上界给出形式保证的机制，因此很难说系统在面对自适应攻击时究竟有多稳。 RobustRAG 提出...

Evaluating Very Long-Term Conversational Memory of LLM Agents

发表：2024-02-27 · 收录：未知 · 智能体与自主科学

在 LongMemEval 之前，超长对话记忆的评测长期缺少高质量、长跨度、可检验时间与因果一致性的会话数据。LoCoMo 试图填补的就是这个空白：它不是简单拉长上下文，而是围绕 persona、时间事件图和跨 session 互动去构建真正需要长期记忆和时间线理解的会话基准。这篇工作的核心新意在于其 mac...

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

发表：2023-10-05 · 收录：未知 · 智能体与自主科学

DSPy reframes prompt engineering as program compilation. Rather than hand-writing brittle prompts end to end, it lets developers specify declarative languag...

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

发表：2023-07-31 · 收录：未知 · 智能体与自主科学

ToolLLM is one of the earliest serious attempts to turn open-source LLMs into broad tool-using agents at realistic API scale. Rather than treating tool use...

Voyager: An Open-Ended Embodied Agent with Large Language Models

发表：2023-05-25 · 收录：未知 · 智能体与自主科学

Voyager is one of the earliest strong demonstrations that an LLM agent can accumulate reusable skills in an open-ended embodied environment instead of merel...

Self-Refine: Iterative Refinement with Self-Feedback

发表：2023-03-30 · 收录：未知 · 智能体与自主科学

Self-Refine studies a broad pattern that later became ubiquitous in LLM systems: generate an answer, critique it in natural language, and then rewrite it us...

Reflexion: Language Agents with Verbal Reinforcement Learning

发表：2023-03-20 · 收录：未知 · 智能体与自主科学

Reflexion addresses a simple but foundational weakness in early language agents: they can act, but they do not reliably turn failure into reusable internal...