MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

推理、记忆与推理时控制颠覆级有讲解视频

策展与解读：DAST AI · 收录方法与内容透明度

发表时间: 2026-06-11
arXiv: 2606.13473

收录解读

这篇论文把数学证明能力组织成生成、验证、修复、排序的 test-time population search。

核心贡献是 generative-verifier RL 与低假阳性 verifier 结合，使同一模型在测试时承担 generator、verifier、refiner 和 ranker。

它值得收录，因为它代表数学推理从单次生成转向 population-level test-time scaling。

局限在于当前证据主要来自预印本实验与作者自建评测，后续需要独立复现和更大范围部署验证。

原始摘要与中文对照

中文对照翻译

MaxProof：利用生成-验证器强化学习和群体级测试时扩展来扩展数学证明我们提出了MaxProof，一个用于MiniMax-M3系列竞赛级数学证明的群体级测试时扩展框架。M3首先利用一个为低误报率而设计的深度防御生成验证器，训练了三项面向证明的能力——证明生成、证明验证和基于批判的证明修复。这些能力被整合到一个发布的M3模型中。在测试时，MaxProof将该模型视为生成器、验证器、精炼器和排序器，在一个候选证明群体中进行搜索，并通过锦标赛选择返回一个最终证明。借助MaxProof的测试时扩展，M3模型在IMO 2025上达到35/42，在USAMO 2026上达到36/42，均超过了人类金牌门槛。

原始摘要

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities—proof generation, proof verification, and critique-conditioned proof repair—using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

解读视频

视频观看页 B 站 YouTube

链接

论文链接

收录解读

原始摘要与中文对照

中文对照翻译

原始摘要

解读视频

相关论文

链接