自验证数学推理 - filings, earnings calls, financial reports, news

自验证数学推理

Search documents

Ge Long Hui· 2025-11-28 07:09

DeepSeek再次归来! 就在昨天晚上，DeepSeek悄悄地上了一个新模型：DeepSeekMath-V2。根据同步发布的技术论文《DeepSeek Math-V2：迈向可自验证的数学推理》，该模型在IMO-ProofBench基准及近期数学竞赛中表现优异，部分性能优于谷歌Gemini DeepThink系列。这是一个数学方面的模型，也是目前行业首个达到IMO(国际奥林匹克数学竞赛)金牌水平且开源的模型。奥数金牌+开源双爆实验结果显示，该模型在IMO 2025：破解5题(共6题)，达到了金牌水平;CMO 2024(中国数学奥林匹克)：达到金牌水平;Putnam 2024：得分118接近满分 (120分)，超越人类参赛者最高分(90分)。 | Contest Problems | | Points | | --- | --- | --- | | IMO 2025 | P1, P2, P3, P4, P5 83.3% | | | CMO 2024 P1 , P2 , P4 , P5 , P6 | | 73.8% | | Putnam 2024 A1 ~ B4 , B5, B6 | | 98.3% | ...

Seek .(US:SKLTY)

自验证数学推理

AI数学推理

Artificial Intelligence

Artificial Intelligence

DeepSeekMath-V2

谷歌Gemini DeepThink系列

谷歌Gemini 3系列

Tai Mei Ti A P P· 2025-11-28 05:45

DeepSeek称，这款模型展现了强大的定理证明能力。换句话说，与此前大多大模型在数学方面的表现不同，Math-V2不再只是"做题家"，而真正有可能靠自身全面、严谨的数学推理能力对科学研究产生深远影响。 DeepSeek也列举了多项验证该模型的强大的证据：Math-V2在IMO（国际数学奥林匹克竞赛）2025和 CMO（中国数学奥林匹克）2024上都取得了金牌级成绩，在北美大学生数学竞赛Putnam 2024上通过扩展测试计算实现了接近满分的成绩（118/120）。 DeepSeek以验证器为奖励模型训练证明生成器，并激励生成器在最终定稿前尽可能多地识别和解决自身证明中的问题，并通过扩展验证计算能力，自动标记新的难以验证的证明，从而创建训练数据以进一步改进验证器。最终，Math-V2诞生了。此前，今年7月，OpenAI和谷歌都曾宣布其模型在IMO2025中取得了金牌级成绩，一度形成大模型数学能力天花板。相比于二者，DeepSeek的Math-V2不仅是首个开源的IMO金牌级模型，在测试中，也在部分性能上展现出了更大的优势。在IMO-Proof Bench评估中，基准测试方面Math-V2得 ...

DeepSeek再破谷歌OpenAI垄断：开源IMO数学金牌大模型

量子位· 2025-11-28 01:53

Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]

Artificial Intelligence

Artificial Intelligence