SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

安全、治理与可靠性突破级暂无讲解视频

发表时间: 2026-06-01
arXiv: 2606.02302

核心要点

问题/背景: Autonomous agents 能访问工具、文件、记忆和外部服务，安全风险越来越依赖执行过程，而不是最终答案。
方法/机制: SeClaw 用 specification-driven security task synthesis 生成 agent security evaluation tasks，旨在覆盖手工安全 benchmark 难以及时覆盖的新威胁。
结果/证据: 它值得收录，因为 agent security 需要可扩展、过程敏感的任务生成机制，而不是只靠人工 jailbreak 样例。
收录价值: 按当前收录规则，它属于近期值得正式跟踪的可复用方法或系统模式；但作为新近预印本，后续仍需要代码、复现和真实部署结果来确认长期影响。

完整收录解读

Autonomous agents 能访问工具、文件、记忆和外部服务，安全风险越来越依赖执行过程，而不是最终答案。

SeClaw 用 specification-driven security task synthesis 生成 agent security evaluation tasks，旨在覆盖手工安全 benchmark 难以及时覆盖的新威胁。

它值得收录，因为 agent security 需要可扩展、过程敏感的任务生成机制，而不是只靠人工 jailbreak 样例。

按当前收录规则，它属于近期值得正式跟踪的可复用方法或系统模式；但作为新近预印本，后续仍需要代码、复现和真实部署结果来确认长期影响。

论文摘要

SeClaw 通过从规范合成安全评估任务，针对状态化工具、文件、内存和服务的访问风险。

英文原文

SeClaw synthesizes security evaluation tasks for autonomous agents from specifications, targeting stateful tool, file, memory, and service access risks.

链接

论文链接论文链接

核心要点

论文摘要

相关论文

链接