收录解读
This position paper reframes agent security away from model robustness alone and toward system-level security invariants around tools, identity, authority, memory, and execution boundaries.
Its central claim is that the model inside an agent should be treated as an untrusted component, with security guarantees enforced by surrounding systems mechanisms rather than by hoping the model refuses bad instructions.
The paper analyzes representative real-world agent attacks through this lens and maps them to classical systems-security principles such as isolation, least privilege, mediation, and auditable control boundaries.
For this repository, the value is the reusable threat-model shift: secure agents need operating-system-like boundaries, not only alignment tuning or prompt-level defenses.