智能体与自主科学
突破级
暂无讲解视频
收录解读
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment 关注的是一个可复用的 AI 系统或评测问题,而不是单点 demo。
On-policy self-evolving safety alignment from verifier-scored agent failures.
It treats safety as trajectory repair for tool-using agents, not just response filtering, and preserves utility/over-refusal constraints.
它没有更高,是因为这些新 arXiv 工作仍需要更多独立复现、真实系统部署和长期社区采用来确认影响。