Workflow
Gemini DeepThink
icon
Search documents
DeepSeek上新:开源模型首达IMO金牌水平,AI推理告别“死记硬背”
Guan Cha Zhe Wang· 2025-11-28 07:17
Core Insights - DeepSeek has released its latest technology achievement, DeepSeek-Math-V2, which focuses on enhancing mathematical reasoning and theorem proving capabilities in large language models, boasting 685 billion parameters [1][5] Performance Highlights - DeepSeek-Math-V2 achieved gold medal levels in the 2025 International Mathematical Olympiad (IMO) and the 2024 Chinese Mathematical Olympiad (CMO), and scored 118 out of 120 in the Putnam 2024 competition, surpassing the historical human record of approximately 90 points [1][3] - In the IMO-ProofBench benchmark, Math-V2 scored nearly 99% on the basic set, significantly outperforming Google's Gemini DeepThink, which scored 89%. On the advanced set, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [4] Technological Innovations - DeepSeek-Math-V2 addresses the "illusion of reasoning" problem highlighted by former OpenAI chief scientist Ilya Sutskever, moving beyond mere answer correctness to ensure rigorous logical reasoning [5][6] - The model employs a strict "process-focused" strategy, requiring clear and logical step-by-step derivations, and does not reward correct final answers if intermediate steps are flawed [6] - A unique multi-level "Meta-Verification" mechanism enhances the reliability of scoring, increasing the confidence level from 0.85 to 0.96 [9] Industry Impact - The release of DeepSeek-Math-V2 has generated significant buzz in the overseas developer community, marking a strong comeback for DeepSeek and breaking the long-standing dominance of closed-source models in top reasoning capabilities [11] - The model's success in mathematical reasoning is expected to influence the coding model space, potentially disrupting existing code assistance tools [11] - The global AI landscape is transitioning from "text generation" to "logical reasoning," with DeepSeek's approach providing a clear path for technological evolution through rigorous validation mechanisms rather than sheer computational power [11]
“在数学上,中国模型没输过”!DeepSeek 深夜屠榜,Math V2 以碾压姿态终结“最强数学模型”之争
AI前线· 2025-11-28 02:54
Core Insights - DeepSeek has released a new mathematics reasoning model, DeepSeek-Math-V2, with 685 billion parameters, which is the first open-source model to reach the gold medal level of the International Mathematical Olympiad (IMO) [2][9] - The model outperforms its predecessor, DeepSeek-Math-7B, which had only 7 billion parameters and was comparable to GPT-4 and Gemini-Ultra [4] - The model's performance in the IMO-ProofBench benchmark shows it scored nearly 99% in the Basic subset, surpassing Gemini DeepThink's 89%, while in the Advanced subset, it scored 61.9%, slightly below Gemini DeepThink's 65.7% [5][9] Performance Metrics - In real competition problems, DeepSeek-Math-V2 achieved gold medal levels in IMO 2025 and CMO 2024, and scored 118 out of 120 in Putnam 2024, demonstrating strong theorem proving capabilities [7][8] - The model's performance in specific contests includes 83.3% in IMO 2025 and 73.8% in CMO 2024 [8] Technical Advancements - The accompanying technical paper highlights significant breakthroughs in mathematical reasoning rigor, theorem proving capabilities, and surpassing some benchmarks set by Google's Gemini DeepThink [9][12] - A key innovation of DeepSeek-Math-V2 is its self-verification mechanism, allowing the model to check its reasoning chain for completeness and logical consistency, which is crucial for mathematical tasks [13][16] Community Response - The open-source release has garnered strong reactions from developer communities, with many expressing surprise at the model's performance and potential future applications in programming [18][20] - Users have noted the importance of mathematical correctness in AI-generated code, indicating a demand for models that excel in mathematical reasoning [20][23] Industry Implications - The release of DeepSeek-Math-V2 is redefining the competitive landscape of large model mathematics reasoning research, with self-verification becoming a key technological pathway for the next generation of mathematical AI [25]
DeepSeek上新,“奥数金牌水平”
Di Yi Cai Jing· 2025-11-28 00:40
Core Insights - DeepSeek has released a new model, DeepSeek-Math-V2, which is the first open-source model to achieve International Mathematical Olympiad (IMO) gold medal level performance [3][5] - The model outperforms Google's Gemini DeepThink in certain benchmarks, showcasing its capabilities in mathematical reasoning [5][9] Performance Metrics - DeepSeek-Math-V2 achieved 83.3% in IMO 2025 and 73.8% in CMO 2024, while scoring 98.3% in the Putnam 2024 competition [4] - In the Basic benchmark, Math-V2 scored nearly 99%, significantly higher than Gemini DeepThink's 89%, but in the Advanced subset, Math-V2 scored 61.9%, slightly lower than Gemini's 65.7% [5] Research Implications - The paper titled "DeepSeek Math-V2: Towards Self-Validating Mathematical Reasoning" emphasizes the importance of rigorous mathematical proof processes rather than just correct answers [8] - DeepSeek advocates for self-validation in mathematical reasoning to enhance the development of more powerful AI systems [8] Industry Reactions - The release of Math-V2 has generated excitement in the industry, with comments highlighting its unexpected success over Google's model [9] - The competitive landscape is evolving, with other major players like OpenAI and Google releasing new models, raising anticipation for DeepSeek's next moves [10]
DeepSeek上新,“奥数金牌水平”
第一财经· 2025-11-28 00:35
Core Viewpoint - DeepSeek has released an open-source model, DeepSeek-Math-V2, which is the first model to achieve IMO gold medal level in mathematics and outperforms Google's Gemini DeepThink in certain benchmarks [3][5]. Group 1: Model Performance - DeepSeek-Math-V2 achieved nearly 99% on the Basic benchmark, significantly outperforming Gemini DeepThink, which scored 89% [5]. - In the more challenging Advanced subset, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [5]. - The model has demonstrated gold medal-level performance in IMO 2025 and CMO 2024, and nearly perfect scores in the Putnam 2024 exam (118/120) [8]. Group 2: Research and Development Insights - DeepSeek emphasizes the importance of verifying mathematical reasoning comprehensively and rigorously, moving from a result-oriented approach to a process-oriented one [8]. - The model is designed to teach AI to review proof processes like a mathematician, enhancing its ability to solve complex mathematical proofs without human intervention [8]. Group 3: Industry Reactions and Expectations - The release of Math-V2 has generated excitement in the industry, with reactions noting that DeepSeek has surpassed expectations by defeating Google's IMO Gold model by a 10% margin [9]. - There is anticipation regarding DeepSeek's next moves, especially concerning updates to its flagship models, as the industry awaits further developments [9].
DeepSeek上新!首个奥数金牌水平的模型来了
Di Yi Cai Jing· 2025-11-28 00:22
Core Insights - DeepSeek has released a new model, DeepSeek-Math-V2, which is the first open-source model to achieve International Mathematical Olympiad (IMO) gold medal level performance [1] - The model outperforms Google's Gemini DeepThink in certain benchmarks, showcasing its capabilities in mathematical reasoning [1][5] Performance Metrics - DeepSeek-Math-V2 achieved 83.3% on IMO 2025 problems and 73.8% on CMO 2024 problems [4] - In the Putnam 2024 competition, it scored 98.3%, demonstrating exceptional performance [4] - On the Basic benchmark, Math-V2 scored nearly 99%, while Gemini DeepThink scored 89% [5] - In the Advanced subset, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [5] Research and Development Focus - The model emphasizes self-verification in mathematical reasoning, moving from a result-oriented approach to a process-oriented one [8] - DeepSeek aims to enhance the rigor and completeness of mathematical proofs, which is crucial for solving open problems [8] - The research indicates that self-verifying mathematical reasoning is a viable direction for developing more powerful AI systems [8] Industry Reaction - The release has generated significant interest, with comments highlighting DeepSeek's competitive edge over Google's model [9] - The industry is keenly awaiting further developments from DeepSeek, especially regarding their flagship model updates [10]