Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

智能体与自主科学突破级暂无讲解视频

发表时间: 2025-03-03

收录解读

This paper focuses on a practical but under-measured bottleneck in tool-using agents: selecting the right tools from large tool inventories. Many agent benchmarks quietly assume a curated small candidate set, which hides the real retrieval problem and makes downstream agent results overly optimistic.

The main contribution is ToolRet, a heterogeneous tool retrieval benchmark with thousands of retrieval tasks and tens of thousands of tools, along with a large-scale training dataset that improves tool-aware retrieval. Just as important, the paper shows that strong general-purpose IR models are surprisingly weak at this task.

For the repository, this is worth collecting because it clarifies a real systems bottleneck and provides a reusable benchmark for future tool-use work. It makes tool retrieval legible as its own subproblem rather than a detail buried inside end-to-end agent scores.

It is not ranked higher because it is primarily a benchmark-and-dataset paper rather than a broader route-defining system abstraction. But the benchmark is durable and directly useful for evaluating real large-scale tool-use agents.

链接

论文链接