HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

Zixing Chen; Yifeng Gao; Li Wang; Yunhan Zhao; Yi Liu; Jiayu Li; Xiang Zheng; Zuxuan Wu; Cong Wang; Xingjun Ma; Yu-Gang Jiang

强化学习突破级暂无讲解视频

策展与解读：DAST AI · 收录方法与内容透明度

发表时间: 2026-04-14
arXiv: 2604.12447

收录解读

这篇论文指出 VLA 模型评估常只看动作执行成功，却忽略同一动作在不同语义上下文中可能变得危险。正确执行动作并不等于安全执行动作。

HazardArena 通过 safe/unsafe twin scenarios 控制变量：物体、布局和动作要求相同，只有语义风险不同。这样可以专门测 VLA 是否把视觉语言语义和动作安全绑定起来。

按本库标准，它是具身安全/VLA 评估的高价值 benchmark，提供了可复用风险分类、资产集和训练-free safety option layer。

局限是 benchmark 风险仍是受控场景，真实机器人安全还涉及物理不确定性、长期后果和人类交互。

原始摘要与中文对照

中文对照翻译

视觉-语言-动作（VLA）模型从视觉-语言骨干模型中继承了丰富的世界知识，并通过动作演示获得了可执行技能。然而，现有评估主要关注动作执行的成功，使得动作策略与视觉-语言语义松散耦合。这种解耦暴露出一个系统性漏洞，即正确的动作执行可能在语义风险下导致不安全的结果。为了揭示这一漏洞，我们引入了HazardArena，一个旨在在受控但具有风险的环境中评估VLA模型语义安全性的基准。HazardArena由安全/不安全孪生场景构建，这些场景共享匹配的对象、布局和动作要求，仅在决定动作是否不安全的语义上下文上有所不同。我们发现，仅在安全场景下训练的VLA模型在评估其对应的不安全场景时，往往无法安全地执行。HazardArena包含2,000多个资产和40个风险敏感任务，涵盖了基于既定机器人安全标准的7个真实世界风险类别。为了缓解这一漏洞，我们提出了一种免训练的安全选项层，该层使用语义属性或视觉-语言判断器来约束动作执行，从而在对任务性能影响最小的情况下显著减少不安全行为。我们希望HazardArena能强调重新思考VLA模型在向实际部署扩展时如何评估和强制执行语义安全性的必要性。通讯作者：xingjunma@fudan.edu.cn；ygj@fudan.edu.cn。

原始摘要

Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsafe outcomes under semantic risk. To expose this vulnerability, we introduce HazardArena, a benchmark designed to evaluate semantic safety in VLAs under controlled yet risk-bearing contexts. HazardArena is constructed from safe/unsafe twin scenarios that share matched objects, layouts, and action requirements, differing only in the semantic context that determines whether an action is unsafe. We find that VLA models trained exclusively on safe scenarios often fail to behave safely when evaluated in their corresponding unsafe counterparts. HazardArena includes over 2,000 assets and 40 risk-sensitive tasks spanning 7 real-world risk categories grounded in established robotic safety standards. To mitigate this vulnerability, we propose a training-free Safety Option Layer that constrains action execution using semantic attributes or a vision–language judge, substantially reducing unsafe behaviors with minimal impact on task performance. We hope that HazardArena highlights the need to rethink how semantic safety is evaluated and enforced in VLAs as they scale toward real-world deployment. Correspondence: xingjunma@fudan.edu.cn; ygj@fudan.edu.cn

链接

论文链接