智能体与自主科学 突破级 暂无讲解视频
发表时间
2026-05-12
arXiv
2605.11882

收录解读

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment 关注的是一个可复用的 AI 系统或评测问题,而不是单点 demo。

On-policy self-evolving safety alignment from verifier-scored agent failures.

It treats safety as trajectory repair for tool-using agents, not just response filtering, and preserves utility/over-refusal constraints.

它没有更高,是因为这些新 arXiv 工作仍需要更多独立复现、真实系统部署和长期社区采用来确认影响。

链接