收录解读
这篇论文尝试把具身推理、规划、纠错、指向和 VLA 适配统一到一个 embodied foundation model。
关键机制是大规模自动数据构建、多任务 RL 和 Planner-Grounder-Corrector 闭环,让同一模型能执行并自纠长程任务。
它值得收录,因为它是具身基础模型向真实机器人泛化和 VLA 数据效率推进的高信号工作。
局限在于当前证据主要来自预印本实验与作者自建评测,后续需要独立复现和更大范围部署验证。
原始摘要与中文对照
中文对照翻译
我们引入Embodied-R1.5,这是一个统一的具身基础模型(EFM),它在一个单一架构中整合了全面的具身推理能力,涵盖具身认知、任务规划、纠正和指向,旨在实现通用物理智能。利用三个自动化数据构建管道,显著扩展了关键能力的数据覆盖范围,我们构建了一个超过150亿(15B)tokens的大规模数据系统,并设计了一种多任务平衡RL策略来缓解异构任务冲突。我们进一步引入了一个PlannerGrounder-Corrector (PGC) 闭环框架,使单个模型能够自主执行和自纠正长周期任务。仅凭80亿(8B)参数,Embodied-R1.5在24个具身VLM基准测试中的16个上取得了SOTA,超越了Gemini-Robotics-ER-1.5和GPT-5.4等领先模型。得益于其内在的具身能力,Embodied-R1.5只需少量数据即可微调成VLA,在4个流行的操作基准套件中超越了𝜋0.5等领先的VLA模型。我们进一步进行了广泛的零样本真实机器人实验,验证了其在指令遵循、功能可供性接地、铰接物体操作和长周期复杂任务中的性能,展示了对物理世界的强大泛化能力。我们开源了模型权重、数据集、训练代码和EmbodiedEvalKit(一个专为具身任务定制的评估框架),以促进EFM领域的未来研究。项目:https://embodied-r.github.io/ 代码:https://github.com/pickxiguapi/Embodied-R1.5 EmbodiedEvalKit:https://github.com/pickxiguapi/EmbodiedEvalKit 模型与数据集:https://huggingface.co/collections/Iff Yuan/embodied-r15
原始摘要
We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a PlannerGrounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like 𝜋0.5 across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs. Project: https://embodied-r.github.io/ Code: https://github.com/pickxiguapi/Embodied-R1.5 EmbodiedEvalKit: https://github.com/pickxiguapi/EmbodiedEvalKit Models & Datasets: https://huggingface.co/collections/Iff Yuan/embodied-r15