Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

智能体与自主科学突破级有讲解视频

收录解读

Claw-Anything 扩展了 personal assistant agent 的评测边界：真实助理需要访问用户长期数字世界，而现有 benchmark 往往只给出局部网页、局部软件或短期任务状态。

它沿三个维度扩大上下文：long-horizon activity histories、interdependent backend services、以及跨设备的 GUI/CLI integrated interaction。

通过 multi-round event injection 模拟数月用户活动，生成含噪声、冲突信号和复杂 world states 的环境，并评估 proactive assistance 和上下文敏感推理。

它值得正式收录，因为它把 agent benchmark 从单任务执行推进到 always-on personal assistant 的长期用户状态和跨服务世界模型，对个人 agent 训练与评测很关键。