新突破!DeepSeek推出新模型

Core Insights - The DeepSeek team has developed a new model, DeepSeekMath-V2, which demonstrates significant advancements in mathematical reasoning capabilities, achieving gold medal levels in major competitions such as the IMO 2025 and CMO 2024, and scoring 118/120 in the Putnam 2024 [2][4][6]. Model Performance - DeepSeekMath-V2 achieved an 83.3% success rate in the IMO 2025 and a 73.8% success rate in the CMO 2024, while scoring 98.3% in the Putnam 2024 [3]. - In a self-constructed test of 91 CNML-level problems, DeepSeekMath-V2 outperformed both GPT-5-Thinking-High and Gemini 2.5-Pro across all categories including algebra, geometry, number theory, combinatorics, and inequalities [6]. Validation Mechanism - The model employs a self-driven verification-generation loop, utilizing one large language model (LLM) as a "reviewer" for proof validation and another as an "author" for proof generation, enhanced by a reinforcement learning mechanism [4]. - The introduction of a "meta-validation" layer aims to effectively suppress model hallucinations, addressing the critical issue of ensuring correct reasoning in mathematical tasks [4]. Benchmark Testing - In the IMO-ProofBench benchmark tests, DeepSeekMath-V2 outperformed DeepMind's DeepThink at the IMO gold medal level in basic sets and maintained strong competitiveness in more challenging advanced sets, significantly surpassing all other benchmark models [8]. Future Directions - The DeepSeek team acknowledges that while substantial work remains, the results indicate that self-verifying mathematical reasoning is a viable research direction, potentially leading to the development of more powerful mathematical AI systems [11].