Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

机器人与具身智能突破级暂无讲解视频

发表时间: 2026-06-09
arXiv: 2606.11324

收录解读

这篇论文尝试把具身推理、规划、纠错、指向和 VLA 适配统一到一个 embodied foundation model。

关键机制是大规模自动数据构建、多任务 RL 和 Planner-Grounder-Corrector 闭环，让同一模型能执行并自纠长程任务。

它值得收录，因为它是具身基础模型向真实机器人泛化和 VLA 数据效率推进的高信号工作。

局限在于当前证据主要来自预印本实验与作者自建评测，后续需要独立复现和更大范围部署验证。

原始摘要与中文对照

中文对照翻译

我们引入Embodied-R1.5，这是一个统一的具身基础模型（EFM），它在一个单一架构中整合了全面的具身推理能力，涵盖具身认知、任务规划、纠正和指向，旨在实现通用物理智能。利用三个自动化数据构建管道，显著扩展了关键能力的数据覆盖范围，我们构建了一个超过150亿（15B）tokens的大规模数据系统，并设计了一种多任务平衡RL策略来缓解异构任务冲突。我们进一步引入了一个PlannerGrounder-Corrector (PGC) 闭环框架，使单个模型能够自主执行和自纠正长周期任务。仅凭80亿（8B）参数，Embodied-R1.5在24个具身VLM基准测试中的16个上取得了SOTA，超越了Gemini-Robotics-ER-1.5和GPT-5.4等领先模型。得益于其内在的具身能力，Embodied-R1.5只需少量数据即可微调成VLA，在4个流行的操作基准套件中超越了𝜋0.5等领先的VLA模型。我们进一步进行了广泛的零样本真实机器人实验，验证了其在指令遵循、功能可供性接地、铰接物体操作和长周期复杂任务中的性能，展示了对物理世界的强大泛化能力。我们开源了模型权重、数据集、训练代码和EmbodiedEvalKit（一个专为具身任务定制的评估框架），以促进EFM领域的未来研究。项目：https://embodied-r.github.io/ 代码：https://github.com/pickxiguapi/Embodied-R1.5 EmbodiedEvalKit：https://github.com/pickxiguapi/EmbodiedEvalKit 模型与数据集：https://huggingface.co/collections/Iff Yuan/embodied-r15

原始摘要

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a PlannerGrounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like 𝜋0.5 across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs. Project: https://embodied-r.github.io/ Code: https://github.com/pickxiguapi/Embodied-R1.5 EmbodiedEvalKit: https://github.com/pickxiguapi/EmbodiedEvalKit Models & Datasets: https://huggingface.co/collections/Iff Yuan/embodied-r15

链接

论文链接论文链接

收录解读

原始摘要与中文对照

中文对照翻译

原始摘要

相关论文

链接