RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

Ruiying Li; Yunlang Zhou; YuYao Zhu; Kylin Chen; Jingyuan Wang; Sukai Wang; Kongtao Hu; Minhui Yu; Bowen Jiang; Zhan Su; Jiayao Ma; Xin He; Yongjian Shen; Yang Yang; Guanghui Ren; Maoqing Yao; Wenhao Wang; Yao Mu

智能体与自主科学突破级暂无讲解视频

发表时间: 2026-03-12
arXiv: 2603.11558

收录解读

问题与背景：长时程机器人任务的一个核心瓶颈是数据采集、策略学习和部署流程割裂，导致需要大量人工重置、策略组合脆弱且执行阶段与采集阶段语义不一致。

方法/新意：RoboClaw 用一个 VLM-driven controller 统一数据采集、策略学习和任务执行，并引入 entangled action pairs，让前向操作和恢复动作形成自复位循环，从而支持连续 on-policy 数据获取和长期任务编排。

意义/放在仓库中的位置：这篇论文属于 agentic robotics / long-horizon manipulation 主线。它的重要性在于把机器人生命周期中的采集、训练和执行放到同一个 agentic 框架里。

局限/为何不再升一级：虽然真实机器人价值高，但方法影响主要仍在 long-horizon manipulation 这一赛道。

原始摘要与中文对照

中文对照翻译

RoboClaw：一个用于可扩展长周期机器人任务的智能体框架。视觉-语言-动作 (VLA) 系统在语言驱动的机器人操作方面展现出巨大潜力。然而，将其扩展到长周期任务仍然充满挑战。现有流程通常将数据采集、策略学习和部署分开，导致严重依赖手动环境复位和脆弱的多策略执行。我们提出了 RoboClaw，一个智能体机器人框架，它在一个单一的 VLM 驱动控制器下统一了数据采集、策略学习和任务执行。在策略层面，RoboClaw 引入了纠缠动作对 (EAP)，它将前向操作行为与逆向恢复动作耦合，形成自复位循环，用于自主数据采集。这种机制实现了持续的在线策略数据采集和迭代策略优化，同时最大限度地减少了人工干预。在部署过程中，同一个智能体执行高级推理，并动态编排学习到的策略原语以完成长周期任务。通过在采集和执行之间保持一致的上下文语义，RoboClaw 减少了两个阶段之间的不匹配，并提高了多策略的鲁棒性。在真实世界操作任务中的实验表明，与传统的开环流程相比，RoboClaw 提高了稳定性和可扩展性，同时显著减少了机器人在整个生命周期中的人工投入，在长周期任务上比基线方法成功率提高了 25%，并减少了 53.7% 的人工时间投入。

原始摘要

Abstract. Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulation. However, scaling them to long-horizon tasks remains challenging. Existing pipelines typically separate data collection, policy learning, and deployment, resulting in heavy reliance on manual environment resets and brittle multi-policy execution. We present RoboClaw, an agentic robotics framework that unifies data collection, policy learning, and task execution under a single VLM-driven controller. At the policy level, RoboClaw introduces Entangled Action Pairs (EAP), which couple forward manipulation behaviors with inverse recovery actions to form self-resetting loops for autonomous data collection. This mechanism enables continuous on-policy data acquisition and iterative policy refinement with minimal human intervention. During deployment, the same agent performs high-level reasoning and dynamically orchestrates learned policy primitives to accomplish long-horizon tasks. By maintaining consistent contextual semantics across collection and execution, RoboClaw reduces mismatch between the two phases and improves multi-policy robustness. Experiments in real-world manipulation tasks demonstrate improved stability and scalability compared to conventional open-loop pipelines, while significantly reducing human effort throughout the robot lifecycle, achieving a 25% improvement in success rate over baseline methods on long-horizon tasks and reducing human time investment by 53.7%.

链接

论文链接