WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

Arnav Kumar Jain; Yilin Wu; Jesse Farebrother; Gokul Swamy; Andrea Bajcsy

机器人与具身智能颠覆级有讲解视频

发表时间: 2026-06-11
arXiv: 2606.13672

收录解读

这篇论文提出 WEAVER，把机器人世界模型从单纯视频/状态预测推进到可用于策略评估、策略改进和测试时规划的多视角模拟器。

核心方法是 multi-view latent world model：用 flow-matching 预测未来 latent 与 reward，并把架构、记忆和预测目标设计为同时满足 fidelity、long-horizon consistency 和 efficiency。

实验不仅报告离线指标，还在真实机器人硬件上验证：模拟轨迹与真实成功率高度相关，在 pi_0.5 基础上提升真实成功率，并比既有世界模型的测试时规划更快。

它值得正式收录，因为它把 embodied world model 具体落到真实 manipulation 的 policy evaluation / improvement / planning 闭环，对机器人基础模型和低交互学习有高外溢价值。

原始摘要与中文对照

中文对照翻译

世界模型（WMs，即学习型模拟器）对机器人技术的潜在影响是深远的——包括策略评估、策略改进和测试时规划——所有这些都只需有限的真实世界交互。为了实现这些下游能力，WM需要同时满足三个期望目标：(i) 保真度（即生成与现实相关的模拟轨迹），(ii) 一致性（即生成在长周期内连贯的模拟轨迹），以及 (iii) 效率（即快速生成模拟轨迹）。我们提出了WEAVER（World Estimation Across Views for Embodied Reasoning）：一种WM架构，它同时实现了这三个期望目标，并在机器人操作任务上提供了最先进的结果。WEAVER是一种多视角WM，通过流匹配损失训练来预测未来潜在变量和奖励值。我们提炼了模型架构、内存和预测目标方面的关键设计决策，这些决策对于解决那些曾困扰以往世界建模方法的长周期动态操作任务至关重要。我们将WEAVER应用于机器人硬件，展示了其在策略评估（与真实世界成功率的相关性 ρ = 0.870）、策略改进（在π0.5机器人基础模型之上，真实世界成功率提高了38%）和测试时规划（真实世界成功率提高了14%，并且比以往的WMs加速了5-10倍）方面的有效性。WEAVER在分布外场景中进行评估时，也表现出比以往WMs更好的性能。代码、模型和视频请访问：https://arnavkj1995.github.io/WEAVER/。

原始摘要

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching—policy evaluation, policy improvement, and test-time planning— all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: (i) fidelity (i.e., producing simulated trajectories that correlate with reality), (ii) consistency (i.e., producing simulated trajectories that are coherent over long horizons), and (iii) efficiency (i.e., producing simulated trajectories quickly). We propose WEAVER (World Estimation Across Views for Embodied Reasoning): a WM architecture that simultaneously achieves all three desiderata, providing state-of-the-art results on robotic manipulation tasks. WEAVER is a multi-view WM trained to predict future latents and reward values via a flow-matching loss. We distill the key design decisions across model architecture, memory, and prediction objectives required to unlock the kinds of longhorizon dynamic manipulation tasks that have confounded prior world modeling approaches. We apply WEAVER in robotic hardware, demonstrating its effectiveness at policy evaluation (ρ = 0.870 correlation with real-world success rate), policy improvement (real-world success rate improvement of 38% on top of the π0.5 robot foundation model), and test-time planning (real-world success rate improvement of 14% with a 5 − 10× speedup over prior WMs). WEAVER also demonstrates better performance than prior WMs when evaluated on out-of-distribution scenarios. Code, models, and videos at: https://arnavkj1995.github.io/WEAVER/.

解读视频

视频观看页 B 站 YouTube

链接

论文链接