理论、鲁棒性与核心机器学习 突破级 暂无讲解视频
发表时间
2026-05-18
arXiv
2605.18991

收录解读

This position paper reframes agent security away from model robustness alone and toward system-level security invariants around tools, identity, authority, memory, and execution boundaries.

Its central claim is that the model inside an agent should be treated as an untrusted component, with security guarantees enforced by surrounding systems mechanisms rather than by hoping the model refuses bad instructions.

The paper analyzes representative real-world agent attacks through this lens and maps them to classical systems-security principles such as isolation, least privilege, mediation, and auditable control boundaries.

For this repository, the value is the reusable threat-model shift: secure agents need operating-system-like boundaries, not only alignment tuning or prompt-level defenses.

链接