ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

智能体与自主科学突破级暂无讲解视频

发表时间: 2025-02-06
arXiv: 2502.04306

收录解读

ScoreFlow continues the automatic workflow-optimization line by targeting a concrete weakness in prior methods: many workflow-search systems rely on discrete search or brittle hand-crafted modification operators, which makes them hard to scale and adapt. The paper proposes a smoother preference-driven optimization route for agent workflows.

Its central idea is to optimize workflows in a continuous space using score-based preference optimization, specifically a quantitative-feedback-aware variant of DPO. This makes workflow improvement less dependent on purely discrete search and positions preference learning as a reusable control signal for multi-step agent orchestration.

This is relevant to the repository because it broadens the workflow-optimization toolbox beyond tree search and graph abstractions. It shows that workflow quality can be optimized with a preference-learning lens, which has direct spillover to coding agents, reasoning pipelines, and multi-agent system design.

It is not ranked higher because it remains one method family inside the larger workflow-optimization line, and its long-term dominance relative to search-based approaches is still uncertain. But it is strong enough to collect as a representative next-stage optimization paper.

链接

论文链接