APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

智能体与自主科学突破级暂无讲解视频

发表时间: 2026-03-30
arXiv: 2603.29093

收录解读

很多 LLM autonomous agents 虽然看起来能规划、执行和反思，但一旦面对结构相似却表面不同的任务，仍然会从头再做一遍，缺乏真正可积累的 procedural memory。现有 memory 方法常常只存简短摘要或纯语义向量，难以支撑跨任务的结构复用。APEX-EM 正是围绕这个问题设计。

论文提出一种 non-parametric online learning 框架，把每次执行过程编码成 structured procedural-episodic experience，显式保留 planning steps、artifacts、iteration history、error analysis 和 quality scores。它再配合 Plan-Retrieve-Generate-Iterate-Ingest（PRGII）工作流、Task Verifiers 的多维 reward，以及结合 semantic search、structural signature matching 和 plan DAG traversal 的 hybrid retrieval，让 agent 在不改权重的情况下复用成功与失败经验。

这篇工作值得收录，因为它把 agent memory 从‘存点笔记’推进到‘存可执行程序化经验’，而且给出了清晰的 online learning loop。对仓库持续扩展的 agent memory、self-improving agents 和 capability accumulation 来说，这种结构化 procedural-episodic replay 是明确可复用的模式，不只是一个 benchmark 技巧。

它没有升到更高一级，是因为当前证据仍主要来自作者选取的几个 benchmark 和固定 backbone，外部 adoption 还没形成。它已经是一篇很强的 agent memory 工作，但是否会成为该方向的主导蓝图，还需要后续验证。

链接

论文链接