Workflow
自验证数学推理
icon
Search documents
吊打谷歌!DeepSeek开源首个“奥数金牌”AI
Ge Long Hui· 2025-11-28 07:09
Core Insights - DeepSeek has launched a new model, DeepSeekMath-V2, which is the first open-source model to reach the International Mathematical Olympiad (IMO) gold medal level [2][4] - The model has shown superior performance in various benchmarks, outperforming Google's Gemini DeepThink series in some areas [2][4] Performance Metrics - In the Basic benchmark, DeepSeekMath-V2 scored nearly 99%, significantly higher than Gemini DeepThink's 89% [4] - In the Advanced subset, Math-V2 scored 61.9%, slightly lower than Gemini DeepThink's 65.7%, indicating competitive performance [4] - The model achieved gold medal level in IMO 2025 by solving 5 out of 6 problems, and also reached gold level in CMO 2024 and scored 118 in Putnam 2024, close to the maximum score of 120 [4][7] Technological Advancements - DeepSeekMath-V2 introduces a self-verifying mathematical reasoning approach, marking a significant milestone in AI mathematical reasoning [10] - The model features a new training mechanism that includes: 1. A reliable verifier that checks each step of theorem proofs for logical consistency [10] 2. A generator that learns to self-improve by identifying and correcting issues during the proof generation process [11] 3. An evolving verification capability that adapts as the generator improves, focusing on difficult-to-verify proofs for further training [11] Industry Impact - The release of DeepSeekMath-V2 is seen as a strategic move in a competitive landscape, coinciding with releases from other major players like OpenAI and Google [10] - The open-source nature of the model under the Apache 2.0 license allows global developers to explore and fine-tune the gold medal-level model, breaking the monopoly of closed-source models in top-tier mathematical reasoning [10]
不只是“做题家”!DeepSeek最新模型打破数学推理局限,部分性能超越Gemini DeepThink
Tai Mei Ti A P P· 2025-11-28 05:45
Core Insights - DeepSeek has released its latest mathematical model, DeepSeek Math-V2, which has generated significant excitement in the AI community due to its self-verifying capabilities in deep reasoning, particularly in mathematics [1][2]. Model Performance - Math-V2 demonstrates strong theorem-proving abilities, distinguishing itself from previous models that merely solved problems without rigorous reasoning [2]. - The model achieved gold medal-level results in the IMO 2025 and CMO 2024 competitions, and scored 118 out of 120 in the Putnam 2024 competition, showcasing its superior performance [2]. Benchmarking Results - In the IMO-Proof Bench evaluation, Math-V2 scored 99%, outperforming Google's Gemini Deep Think (89%) and GPT-5 (59%) [3]. - In advanced testing, Math-V2 scored 61.9%, just behind Gemini Deep Think's 65.7% [3]. Community Impact - The release of Math-V2 has sparked discussions across social media platforms and communities, highlighting its potential to automate verification-heavy tasks in programming languages [5][8]. - Experts in the AI field have praised DeepSeek's return and the significance of Math-V2, indicating a shift from "chatbot" to "reasoner" era in AI development [8][9].
DeepSeek再破谷歌OpenAI垄断:开源IMO数学金牌大模型
量子位· 2025-11-28 01:53
Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]