Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

智能体与自主科学突破级暂无讲解视频

发表时间: 2026-04-01
arXiv: 2604.00842

收录解读

proactive assistants 要真正有用，关键不只是能回应用户请求，而是能在合适时机主动介入、推断目标并执行任务。但这个方向长期缺少像样的评测环境，因为很多现有框架把 app 抽象成平面化的 tool-calling API，根本无法模拟真实用户与状态化数字环境之间的序列交互。Pare 正是在补这一层。

论文提出 Proactive Agent Research Environment，把应用建模为 finite state machines，并为 user simulator 提供 stateful navigation 与 state-dependent action space，从而支持 active user simulation。基于这个环境，作者又构建 Pare-Bench，覆盖 communication、productivity、scheduling、lifestyle 等 143 个任务，用来测试 context observation、goal inference、intervention timing 和 multi-app orchestration。

这篇工作值得收录，因为 proactive agent 目前最大的缺口正是评测接口，而 Pare 给出了比平面 API mock 更接近真实使用场景的环境建模方式。对 proactive assistants、computer-use agents 和 context-aware agent evaluation，这种 stateful user simulation 具有明显长期参考价值。

它没有升到更高一级，是因为当前仍然是一个较新的 benchmark/environment 提案，是否会成为该方向默认评测底座还需要社区采用与持续维护证明。它已经很强，但还不到更高层级。

链接

论文链接