Arbor: Tree Search as a Cognition Layer for Autonomous Agents

Neha Prakriya; Chaojun Hou; Zheng Gong; Huasha Zhao; Xi Zhao; Mou Li; Zhenyu Gu; Emad Barsoum

智能体与自主科学突破级暂无讲解视频

发表时间: 2026-06-10
arXiv: 2606.12563

核心要点

问题/背景: Arbor 把 tree search 放在 autonomous agent 的 cognition layer 位置，而不是把智能体执行看成一次性轨迹或无状态优化。
方法/机制: 系统维护一棵 scored hypothesis search tree，作为多智能体共享工作记忆，失败会成为诊断信号并改变后续探索方向。
结果/证据: 验证场景是全栈 LLM inference optimization，涉及应用、框架、编译器、kernel、硬件多层协作；系统由 orchestrator、critic 和 domain specialists 组成。
收录价值: 它值得收录，因为它把 agent memory、诊断、搜索和多专家协作合成可复用 agent harness，对复杂工程优化和长程自治执行有明确方法价值。

完整收录解读

Arbor 把 tree search 放在 autonomous agent 的 cognition layer 位置，而不是把智能体执行看成一次性轨迹或无状态优化。

系统维护一棵 scored hypothesis search tree，作为多智能体共享工作记忆，失败会成为诊断信号并改变后续探索方向。

验证场景是全栈 LLM inference optimization，涉及应用、框架、编译器、kernel、硬件多层协作；系统由 orchestrator、critic 和 domain specialists 组成。

它值得收录，因为它把 agent memory、诊断、搜索和多专家协作合成可复用 agent harness，对复杂工程优化和长程自治执行有明确方法价值。

原始摘要与中文对照

中文对照翻译

Arbor：作为自主智能体认知层的树搜索。Arbor是一个多智能体框架，它引入结构化树搜索作为在大型有状态动作空间中运行的自主智能体的认知层。先前的自主优化系统在孤立的目标上运行，并进行无状态评估。Arbor则维护一个显式的、带有评分假设的搜索树，该搜索树作为智能体之间共享的工作记忆，随每次测量而演进，将失败视为诊断信号以重塑后续探索，并随着先前的成功改变瓶颈分布而扩展。我们在全栈LLM推理优化领域验证了Arbor，在这个领域，实现峰值性能历来需要来自应用、框架、编译器、内核和硬件栈的工程团队的协调努力。Arbor将一个Orchestrator agent（通过委派给推理栈中的Domain Specialists来驱动优化）与一个Critic agent（通过根本原因分析、内省和测量验证来保障稳定性）配对——这是一种制衡架构，其中任何一个智能体都不能单方面驱动系统。智能体能力被分解为硬技能（领域专业知识）和软技能（决定贡献如何组合的协调协议），从而实现完全自主的多日活动。Arbor在推理吞吐量-延迟帕累托改进方面比供应商优化基线高出193%，而单个没有该机制的智能体在吞吐量方面仅达到+33%的改进，并在数小时内不可恢复地崩溃。Arbor可推广到多代硬件平台，并且运行间差异在2个百分点以内，这表明该方法是硬件无关且可复现的。

原始摘要

Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation. Arbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration, and expanding as prior successes shift the bottleneck distribution. We validate Arbor on full-stack LLM inference optimization, a domain where achieving peak performance has historically required coordinated effort from engineering teams across the application, framework, compiler, kernel, and hardware stack. Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that safeguards stability through root-cause analysis, introspection, and measurement validation—a checks-and-balances architecture where neither agent can unilaterally drive the system. Agent capabilities are decomposed into hard skills (domain expertise) and soft skills (coordination protocols that determine how contributions compose), enabling fully autonomous multi-day campaigns. Arbor achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% throughput improvement and crashes irrecoverably within hours. Arbor generalizes to multiple generations of hardware platform, and run-to-run variance is within 2 percentage points demonstrating that the method is hardware-agnostic and reproducible.

链接

论文链接论文链接

核心要点

原始摘要与中文对照

中文对照翻译

原始摘要

相关论文

链接