Kimodo: Scaling Controllable Human Motion Generation

智能体与自主科学突破级暂无讲解视频

发表时间: 2026-03-16

收录解读

Kimodo 处理的是 humanoid robotics、simulation 和 animation 都共同缺少的一块基础设施：高质量、可控、可大规模生成的 3D human motion 数据。过去的文本到动作或约束到动作模型通常受限于小规模公开 mocap 数据，因此在运动质量、控制精度和泛化上都很难真正支撑机器人数据需求。

这篇技术报告提出一个在 700 小时光学 mocap 数据上训练的 kinematic motion diffusion model，并用专门设计的 motion representation 与两阶段 denoiser 分解 root/body 预测，降低 foot skating、floating 等常见伪影。模型同时支持文本、全身关键帧、稀疏关节位置与旋转、2D waypoints 和 dense path 等约束，并直接提供 SOMA 与 Unitree G1 skeleton 上的模型与 authoring demo。

这篇工作值得正式收录，因为它不是普通动作生成论文，而是把 controllable motion generation 明确推向机器人可用的数据生成与 authoring 基础设施。对具身智能和 humanoid learning 来说，它连接了大规模优质运动数据、可控生成接口、以及后续 policy training 的 demonstration pipeline，外溢价值明显强于娱乐向 motion synthesis。

它当前仍是 breakthrough 而不是更高一级，因为核心成果仍集中在离线 motion authoring 和 demonstration generation，本体上还没有闭环到更广泛的 robot control / world model 训练体系。它后续能否进一步上升，要看 Kimodo 是否真正成为 humanoid motion data generation 的标准底座。

链接

论文链接项目