Seek .-DeepSeek上新：开源模型首达IMO金牌水平，AI推理告别“死记硬背”

Core Insights - DeepSeek has released its latest technology achievement, DeepSeek-Math-V2, which focuses on enhancing mathematical reasoning and theorem proving capabilities in large language models, boasting 685 billion parameters [1][5] Performance Highlights - DeepSeek-Math-V2 achieved gold medal levels in the 2025 International Mathematical Olympiad (IMO) and the 2024 Chinese Mathematical Olympiad (CMO), and scored 118 out of 120 in the Putnam 2024 competition, surpassing the historical human record of approximately 90 points [1][3] - In the IMO-ProofBench benchmark, Math-V2 scored nearly 99% on the basic set, significantly outperforming Google's Gemini DeepThink, which scored 89%. On the advanced set, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [4] Technological Innovations - DeepSeek-Math-V2 addresses the "illusion of reasoning" problem highlighted by former OpenAI chief scientist Ilya Sutskever, moving beyond mere answer correctness to ensure rigorous logical reasoning [5][6] - The model employs a strict "process-focused" strategy, requiring clear and logical step-by-step derivations, and does not reward correct final answers if intermediate steps are flawed [6] - A unique multi-level "Meta-Verification" mechanism enhances the reliability of scoring, increasing the confidence level from 0.85 to 0.96 [9] Industry Impact - The release of DeepSeek-Math-V2 has generated significant buzz in the overseas developer community, marking a strong comeback for DeepSeek and breaking the long-standing dominance of closed-source models in top reasoning capabilities [11] - The model's success in mathematical reasoning is expected to influence the coding model space, potentially disrupting existing code assistance tools [11] - The global AI landscape is transitioning from "text generation" to "logical reasoning," with DeepSeek's approach providing a clear path for technological evolution through rigorous validation mechanisms rather than sheer computational power [11]