收录解读
This paper focuses on a practical but under-measured bottleneck in tool-using agents: selecting the right tools from large tool inventories. Many agent benchmarks quietly assume a curated small candidate set, which hides the real retrieval problem and makes downstream agent results overly optimistic.
The main contribution is ToolRet, a heterogeneous tool retrieval benchmark with thousands of retrieval tasks and tens of thousands of tools, along with a large-scale training dataset that improves tool-aware retrieval. Just as important, the paper shows that strong general-purpose IR models are surprisingly weak at this task.
For the repository, this is worth collecting because it clarifies a real systems bottleneck and provides a reusable benchmark for future tool-use work. It makes tool retrieval legible as its own subproblem rather than a detail buried inside end-to-end agent scores.
It is not ranked higher because it is primarily a benchmark-and-dataset paper rather than a broader route-defining system abstraction. But the benchmark is durable and directly useful for evaluating real large-scale tool-use agents.