RobotValues: Evaluating Household Robots When Human Values Conflict

Jongwook Han; Hyeongjin Kim; Yohan Jo

机器人与具身智能突破级暂无讲解视频

发表时间: 2026-06-02
arXiv: 2606.03312

核心要点

问题/背景: 这篇论文指出家用机器人评估不能只看任务完成或安全合规，因为真实家庭场景常常要求在 autonomy、efficiency、privacy、social appropriateness 等价值之间取舍。
方法/机制: RobotValues 构建 10K 个 value-conflict scenarios，每个样本包含真实感家居图像和多个可行动作选项，不同选项优先不同人类价值。
结果/证据: 评测显示，当前 VLM/robot planner 存在默认价值偏好，例如偏向 safety 和 accommodation，但在被要求优先 privacy 等冲突价值时，往往无法覆盖默认动作。
收录价值: 收录价值在于它把 embodied safety 从“是否危险”扩展到“价值冲突下是否能按用户/场景价值做选择”，这是 household robots 和 VLA deployment 很需要的评估边界。

完整收录解读

这篇论文指出家用机器人评估不能只看任务完成或安全合规，因为真实家庭场景常常要求在 autonomy、efficiency、privacy、social appropriateness 等价值之间取舍。

RobotValues 构建 10K 个 value-conflict scenarios，每个样本包含真实感家居图像和多个可行动作选项，不同选项优先不同人类价值。

评测显示，当前 VLM/robot planner 存在默认价值偏好，例如偏向 safety 和 accommodation，但在被要求优先 privacy 等冲突价值时，往往无法覆盖默认动作。

收录价值在于它把 embodied safety 从“是否危险”扩展到“价值冲突下是否能按用户/场景价值做选择”，这是 household robots 和 VLA deployment 很需要的评估边界。

原始摘要与中文对照

中文对照翻译

摘要：虽然家用机器人通常根据任务完成度进行评估，但日常家庭环境涉及价值观冲突情境，在这种情境中，机器人被期望选择优先于任务成功之外的其他价值观的行动，例如人类自主性、效率或社会适宜性。然而，目前还没有用于评估机器人在此类情境中价值观偏好的基准。我们引入了ROBOT VALUES，这是一个用于评估家用机器人规划器在1万个价值观冲突情境中的基准。每个实例都包含一张真实的家庭图像，其中有多个可行的机器人行动，这些行动优先考虑不同的人类价值观。我们通过LLM辅助的情境生成、利益相关者驱动的价值观提取、图像生成和自动质量控制来构建ROBOT VALUES。使用ROBOT VALUES，我们评估了机器人技术中使用的VLM，发现模型表现出默认的价值观偏好，包括安全性和顺从性，同时对优先考虑隐私的行动选择不足。当模型被指示优先考虑与其自身偏好冲突的特定价值观时，它们通常无法覆盖其默认行动，在80%的时间里选择了不正确的行动。这些发现表明，家用机器人评估不仅应衡量任务完成度或安全合规性，还应衡量机器人在人类价值观冲突时是否能在多个可行行动中做出选择1。关键词：家用机器人，人类价值观

原始摘要

Abstract: While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots’ value preferences in such scenarios. We introduce ROBOT VALUES, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct ROBOT VALUES through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using ROBOT VALUES we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict1 . Keywords: Household Robots, Human values

链接

论文链接论文链接

核心要点

原始摘要与中文对照

中文对照翻译

原始摘要

相关论文

链接