数学人工智能
Search documents
新突破!DeepSeek推出新模型
新华网财经· 2025-11-28 01:15
Core Insights - DeepSeek launched a new mathematical reasoning model, DeepSeekMath-V2, on HuggingFace, which utilizes a self-verifying training framework [2] - The model is built on DeepSeek-V3.2-Exp-Base and employs an LLM verifier to automatically review generated mathematical proofs, continuously optimizing performance with high-difficulty samples [3] - DeepSeekMath-V2 achieved gold medal levels in both the 2025 International Mathematical Olympiad (IMO) and the 2024 Chinese Mathematical Olympiad (CMO), and scored 118/120 in the 2024 Putnam Mathematical Competition [3][4] Performance Metrics - In the IMO 2025, the model scored 83.3% across problems P1 to P5 [4] - In the CMO 2024, it achieved 73.8% on problems P1, P2, P4, P5, and P6 [4] - In the Putnam 2024, it scored 98.3% on problems A1 to B6 [4] Model Architecture - The core architecture of DeepSeekMath-V2 establishes a self-driven verification-generation loop, with one LLM acting as a "reviewer" for proof verification and another as a "creator" for proof generation, utilizing reinforcement learning for collaboration [5] - A "meta-verification" layer is introduced to effectively suppress model hallucinations [5] Competitive Edge - In a self-constructed test of 91 CNML-level problems, DeepSeekMath-V2 demonstrated superior mathematical reasoning capabilities, outperforming GPT-5-Thinking-High and Gemini 2.5-Pro across all categories including algebra, geometry, number theory, combinatorics, and inequalities [7] - The model also excelled in the IMO-ProofBench benchmark, surpassing DeepMind's DeepThink at the IMO gold medal level in basic sets and maintaining strong competitiveness in more challenging advanced sets [8] Future Directions - The DeepSeek team indicates that while significant work remains, these results suggest that self-verifying mathematical reasoning is a viable research direction, potentially aiding in the development of more powerful mathematical AI systems [10]
新突破!DeepSeek推出新模型
Shang Hai Zheng Quan Bao· 2025-11-27 16:18
Core Insights - The DeepSeek team has developed a new model, DeepSeekMath-V2, which demonstrates significant advancements in mathematical reasoning capabilities, achieving gold medal levels in major competitions such as the IMO 2025 and CMO 2024, and scoring 118/120 in the Putnam 2024 [2][4][6]. Model Performance - DeepSeekMath-V2 achieved an 83.3% success rate in the IMO 2025 and a 73.8% success rate in the CMO 2024, while scoring 98.3% in the Putnam 2024 [3]. - In a self-constructed test of 91 CNML-level problems, DeepSeekMath-V2 outperformed both GPT-5-Thinking-High and Gemini 2.5-Pro across all categories including algebra, geometry, number theory, combinatorics, and inequalities [6]. Validation Mechanism - The model employs a self-driven verification-generation loop, utilizing one large language model (LLM) as a "reviewer" for proof validation and another as an "author" for proof generation, enhanced by a reinforcement learning mechanism [4]. - The introduction of a "meta-validation" layer aims to effectively suppress model hallucinations, addressing the critical issue of ensuring correct reasoning in mathematical tasks [4]. Benchmark Testing - In the IMO-ProofBench benchmark tests, DeepSeekMath-V2 outperformed DeepMind's DeepThink at the IMO gold medal level in basic sets and maintained strong competitiveness in more challenging advanced sets, significantly surpassing all other benchmark models [8]. Future Directions - The DeepSeek team acknowledges that while substantial work remains, the results indicate that self-verifying mathematical reasoning is a viable research direction, potentially leading to the development of more powerful mathematical AI systems [11].