元验证 - filings, earnings calls, financial reports, news

元验证

Search documents

Guan Cha Zhe Wang· 2025-11-28 07:17

在OpenAI发布GPT-5.1、谷歌推出Gemini 3系列的背景下，国内AI独角兽DeepSeek虽然迟迟未能带来基座模型的大更新，但也于本周三晚间低调发布了其最新的技术成果DeepSeek-Math-V2。据官方披露的技术报告显示，DeepSeek-Math-V2拥有685B参数量，专注于提升大语言模型的数学推理与定理证明能力。在多项高难度数学竞赛基准中，该模型交出了一份极具冲击力的成绩单。首先是顶尖竞赛表现，在2025年国际数学奥林匹克竞赛（IMO 2025）和2024年中国数学奥林匹克竞赛（CMO 2024）中，Math-V2均达到了金牌水平。特别是在被称为"数学界炼狱"的普特南（Putnam 2024）数学竞赛中，通过扩展测试计算（Test-time Compute），该模型取得了118分（满分120分）的近乎满分成绩，远超人类选手约90分的历史最高分记录。 | Contest | Problems | Points | | --- | --- | --- | | IMO 2025 | P1, P2, P3, P4, P5 83.3% | | | CMO 2024 | P1 , P2 ...

DeepSeek再破谷歌OpenAI垄断：开源IMO数学金牌大模型

量子位· 2025-11-28 01:53

Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]

Artificial Intelligence

Artificial Intelligence