核心要点
- 问题/背景
- 这篇论文处理长运行 agent 的系统边界问题:chat loop + tool registry 不足以表达身份、权限、暂停恢复、子进程、人类审批和审计。
- 方法/机制
- Agent libOS 把 agent 建模为 AgentProcess,提供 process identity、parent-child lineage、typed Object Memory、capabilities、tool tables、human queues、checkpoints、events 和 audit records。
- 结果/证据
- 它的核心设计规则是 tools are libc-like wrappers; runtime primitives are the authority boundary,即模型可见工具不等于资源权限,文件、对象、人类审批、shell 和 JIT tool registration 都在 runtime primitive 层做 capability check。
- 收录价值
- 它值得收录,因为它不是普通 agent benchmark,而是给 computer-use / coding / long-running agents 提供可复用运行时、权限和审计原语;局限是当前评估偏系统属性和原型测试,缺少真实任务吞吐、成本和红队部署评测。
原始摘要与中文对照
中文对照翻译
Agent libOS:一种受库操作系统启发的运行时,用于长时间运行、能力受控的LLM智能体。大型语言模型(LLM)智能体正在从请求-响应助手演变为长时间运行的软件执行者:它们在模型调用之间维护状态,分叉子任务,等待外部事件,请求人工授权,生成工具,并执行必须恢复和审计的副作用。现有框架通常将这些行为实现为聊天循环加上面向模型的工具注册表。这种抽象很方便,但它常常将行动可见性与资源权限混为一谈:模型可见的工具模式可能由直接接触主机文件系统、终端、网络、shell或凭据的封装器支持。本文提出了Agent libOS,一种受库操作系统启发的LLM智能体运行时基底。Agent libOS运行在传统主机操作系统之上;它不实现硬件驱动程序、内核模式隔离或POSIX兼容操作系统。相反,它将智能体视为一个AgentProcess:一个可调度的执行主体,具有进程身份、父子血缘、生命周期状态、从AgentImage派生的工具表、类型化的Object Memory、显式能力、人工队列、检查点、事件和审计记录。其核心设计原则是工具是类libc封装器;运行时原语是权限边界。文件系统访问、对象访问、休眠、人工批准、JIT工具注册和外部副作用在原语边界处根据显式能力和策略进行检查。我们描述了其设计、威胁模型、Python原型和面向安全的评估。当前原型实现了异步调度、命名空间本地的Object Memory、运行时集成的人工批准、一次性权限授予、每个进程的工作目录、shell和镜像注册原语、通过libOS系统调用代理的Deno/TypeScript JIT工具、文件系统/对象桥接工具、可注入的Resource Provider Substrate、确定性演示、真实模型冒烟脚本,以及截至撰写时的123个回归测试。Agent libOS并非旨在提高规划器准确性,而是展示了一种运行时基底,其中长时间运行的LLM智能体可以在不将工具分派视为信任边界的情况下进行调度、授权、恢复和审计。
原始摘要
Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. Existing frameworks typically implement these behaviors as a chat loop plus a model-facing tool registry. This abstraction is convenient, but it often conflates action visibility with resource authority: a tool schema visible to the model may be backed by a wrapper that directly touches the host filesystem, terminal, network, shell, or credentials. This paper presents Agent libOS, a library-OS-inspired runtime substrate for LLM agents. Agent libOS runs above a conventional host operating system; it does not implement hardware drivers, kernel-mode isolation, or a POSIX-compatible operating system. Instead, it treats an agent as an AgentProcess: a schedulable execution subject with process identity, parent-child lineage, lifecycle state, a tool table derived from an AgentImage, typed Object Memory, explicit capabilities, human queues, checkpoints, events, and audit records. Its central design rule is tools are libc-like wrappers; runtime primitives are the authority boundary. Filesystem access, object access, sleeps, human approval, JIT tool registration, and external side effects are checked at primitive boundaries under explicit capabilities and policy. We describe the design, threat model, Python prototype, and safety-oriented evaluation. The current prototype implements async scheduling, namespace-local Object Memory, runtime-integrated human approval, one-shot permission grants, per-process working directories, shell and image-registration primitives, Deno/TypeScript JIT tools over a libOS syscall broker, filesystem/object bridge tools, an injectable Resource Provider Substrate, deterministic demos, real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a runtime substrate in which long-running LLM agents can be scheduled, authorized, resumed, and audited without treating tool dispatch as the trust boundary.