智能体与自主科学 突破级 有讲解视频
发表时间
2026-03-03
arXiv
2603.02766

收录解读

EvoSkill addresses a real limitation in coding and tool-using agents: general-purpose LLM flexibility does not automatically produce durable domain expertise. Prior work often relied on hand-written skills or on evolutionary optimization of low-level prompts and code fragments that remained tightly coupled to a particular model or benchmark.

The paper’s core contribution is to move optimization up to the skill level. EvoSkill analyzes failure trajectories, proposes new skills or edits to existing ones, and materializes them into structured reusable skill folders while keeping the underlying model frozen. A Pareto frontier over agent programs governs retention, so only skills that improve held-out validation performance survive. This makes skill discovery a persistent external capability-building loop rather than a one-off prompt rewrite.

This belongs in the repository because it sits directly on the capability-extension line that already includes MetaClaw, Trace2Skill, SkillRouter, and the practical memory/skill tracks. What makes EvoSkill worth collecting is that it does not just distill local traces; it explicitly discovers, edits, and selects reusable skills with cross-task transfer evidence, including zero-shot transfer from SealQA-evolved skills to BrowseComp.

It is not ranked higher because the evidence is still limited to a small set of benchmarks and an arXiv-stage system, and the skill-evolution line remains crowded with nearby variants. The gains are meaningful and the abstraction is strong, but it is not yet clear that EvoSkill is the dominant reference for long-term agent skill evolution rather than one strong entry in that cluster.

解读视频

链接