Workflow
自验证数学推理
icon
Search documents
吊打谷歌!DeepSeek开源首个“奥数金牌”AI
Ge Long Hui· 2025-11-28 07:09
DeepSeek再次归来! 就在昨天晚上,DeepSeek悄悄地上了一个新模型:DeepSeekMath-V2。 根据同步发布的技术论文《DeepSeek Math-V2:迈向可自验证的数学推理》,该模型在IMO-ProofBench基准及近期数学竞赛中表现优异,部分性能优于 谷歌Gemini DeepThink系列。 这是一个数学方面的模型,也是目前行业首个达到IMO(国际奥林匹克数学竞赛)金牌水平且开源的模型。 奥数金牌+开源双爆 实验结果显示,该模型在IMO 2025:破解5题(共6题),达到了金牌水平;CMO 2024(中国数学奥林匹克):达到金牌水平;Putnam 2024:得分118接近满分 (120分),超越人类参赛者最高分(90分)。 | Contest Problems | | Points | | --- | --- | --- | | IMO 2025 | P1, P2, P3, P4, P5 83.3% | | | CMO 2024 P1 , P2 , P4 , P5 , P6 | | 73.8% | | Putnam 2024 A1 ~ B4 , B5, B6 | | 98.3% | ...
不只是“做题家”!DeepSeek最新模型打破数学推理局限,部分性能超越Gemini DeepThink
Tai Mei Ti A P P· 2025-11-28 05:45
DeepSeek称,这款模型展现了强大的定理证明能力。换句话说,与此前大多大模型在数学方面的表现 不同,Math-V2不再只是"做题家",而真正有可能靠自身全面、严谨的数学推理能力对科学研究产生深 远影响。 DeepSeek也列举了多项验证该模型的强大的证据:Math-V2在IMO(国际数学奥林匹克竞赛)2025和 CMO(中国数学奥林匹克)2024上都取得了金牌级成绩,在北美大学生数学竞赛Putnam 2024上通过扩 展测试计算实现了接近满分的成绩(118/120)。 DeepSeek以验证器为奖励模型训练证明生成器,并激励生成器在最终定稿前尽可能多地识别和解决自 身证明中的问题,并通过扩展验证计算能力,自动标记新的难以验证的证明,从而创建训练数据以进一 步改进验证器。 最终,Math-V2诞生了。 此前,今年7月,OpenAI和谷歌都曾宣布其模型在IMO2025中取得了金牌级成绩,一度形成大模型数学 能力天花板。相比于二者,DeepSeek的Math-V2不仅是首个开源的IMO金牌级模型,在测试中,也在部 分性能上展现出了更大的优势。 在IMO-Proof Bench评估中,基准测试方面Math-V2得 ...
DeepSeek再破谷歌OpenAI垄断:开源IMO数学金牌大模型
量子位· 2025-11-28 01:53
Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]