ResearchMath-14K: Scaling Research-Level Mathematics via Agents

数学与形式推理突破级暂无讲解视频

发表时间: 2026-05-27
arXiv: 2605.28003

核心要点

问题/背景: 这篇论文面向研究级数学推理的数据瓶颈：普通竞赛题不足以衡量模型是否能处理未知或开放的数学问题。
方法/机制: 作者用 multi-agent pipeline 从学术来源整理 14,056 个 research-level math problems，并生成 22 万条 open-model teacher trajectories。
结果/证据: 一个重要发现是新一代模型在开放数学尝试中引用更多，但 fake references 也大幅增加；经过 agentic filtering 后，这些不完全正确的尝试仍能作为监督信号提升 Qwen3 系列。
收录价值: 收录价值在于它把数学 agent 从 benchmark-solving 推向 research-level problem corpus、轨迹过滤和开放问题训练数据构造。

完整收录解读

这篇论文面向研究级数学推理的数据瓶颈：普通竞赛题不足以衡量模型是否能处理未知或开放的数学问题。

作者用 multi-agent pipeline 从学术来源整理 14,056 个 research-level math problems，并生成 22 万条 open-model teacher trajectories。

一个重要发现是新一代模型在开放数学尝试中引用更多，但 fake references 也大幅增加；经过 agentic filtering 后，这些不完全正确的尝试仍能作为监督信号提升 Qwen3 系列。

收录价值在于它把数学 agent 从 benchmark-solving 推向 research-level problem corpus、轨迹过滤和开放问题训练数据构造。

论文摘要

ResearchMath-14K 通过多智能体管道，从学术资源中精选了14,056个研究级别的数学问题，并生成了220K个教师轨迹。它研究了开放模型中的回避行为和虚假引用行为，并表明，经过微调后，由智能体过滤的开放问题尝试可以改善Qwen3的数学推理能力。

英文原文

ResearchMath-14K curates 14,056 research-level mathematical problems from academic sources via a multi-agent pipeline and generates 220K teacher trajectories. It studies avoidance and fake-reference behavior in open models and shows that agentically filtered open-problem attempts can improve Qwen3 math reasoning after fine-tuning.

链接

论文链接论文链接代码

核心要点

论文摘要

相关论文

链接