收录解读
ToolLLM is one of the earliest serious attempts to turn open-source LLMs into broad tool-using agents at realistic API scale. Rather than treating tool use as a handful of handcrafted functions, it frames the problem around thousands of real-world APIs and builds the full stack needed to train and evaluate that capability.
The paper’s importance comes from the combination of ToolBench, ToolEval, retrieval, and search-based calling. It does not just show a model using tools; it creates data construction, evaluation infrastructure, and a training recipe that made large-scale tool-use research more concrete and reproducible.
This is highly relevant to the repository because later tool-use agent work repeatedly depends on this line of thinking: tool use is an infrastructure problem involving datasets, retrieval, execution traces, and scalable evaluation, not just prompting. ToolLLM therefore acts as a foundational reference for the tool-use ecosystem.
It is not ranked higher because subsequent work improves specific pieces such as retrieval quality, unified generation, and orchestration. But as a durable early systems reference for large-scale tool learning, it merits formal collection at a high grade.