TextGrad: Automatic "Differentiation" via Text
TextGrad takes a useful systems idea and makes it explicit: if many LLM pipelines are made of textual intermediate states, then optimization can also happen...
TextGrad takes a useful systems idea and makes it explicit: if many LLM pipelines are made of textual intermediate states, then optimization can also happen...
RAG 系统的一个核心脆弱点是 retrieval corruption:攻击者只要向检索结果里注入恶意文段,就可能把最终回答拉偏。大多数现有防御依赖启发式过滤、重排序或 prompt-level 规避,缺少能够对攻击上界给出形式保证的机制,因此很难说系统在面对自适应攻击时究竟有多稳。 RobustRAG 提出...
在 LongMemEval 之前,超长对话记忆的评测长期缺少高质量、长跨度、可检验时间与因果一致性的会话数据。LoCoMo 试图填补的就是这个空白:它不是简单拉长上下文,而是围绕 persona、时间事件图和跨 session 互动去构建真正需要长期记忆和时间线理解的会话基准。 这篇工作的核心新意在于其 mac...
DSPy reframes prompt engineering as program compilation. Rather than hand-writing brittle prompts end to end, it lets developers specify declarative languag...
ToolLLM is one of the earliest serious attempts to turn open-source LLMs into broad tool-using agents at realistic API scale. Rather than treating tool use...
Voyager is one of the earliest strong demonstrations that an LLM agent can accumulate reusable skills in an open-ended embodied environment instead of merel...
Self-Refine studies a broad pattern that later became ubiquitous in LLM systems: generate an answer, critique it in natural language, and then rewrite it us...
Reflexion addresses a simple but foundational weakness in early language agents: they can act, but they do not reliably turn failure into reusable internal...