Workflow
AI数学推理
icon
Search documents
吊打谷歌!DeepSeek开源首个“奥数金牌”AI
Ge Long Hui· 2025-11-28 07:09
Core Insights - DeepSeek has launched a new model, DeepSeekMath-V2, which is the first open-source model to reach the International Mathematical Olympiad (IMO) gold medal level [2][4] - The model has shown superior performance in various benchmarks, outperforming Google's Gemini DeepThink series in some areas [2][4] Performance Metrics - In the Basic benchmark, DeepSeekMath-V2 scored nearly 99%, significantly higher than Gemini DeepThink's 89% [4] - In the Advanced subset, Math-V2 scored 61.9%, slightly lower than Gemini DeepThink's 65.7%, indicating competitive performance [4] - The model achieved gold medal level in IMO 2025 by solving 5 out of 6 problems, and also reached gold level in CMO 2024 and scored 118 in Putnam 2024, close to the maximum score of 120 [4][7] Technological Advancements - DeepSeekMath-V2 introduces a self-verifying mathematical reasoning approach, marking a significant milestone in AI mathematical reasoning [10] - The model features a new training mechanism that includes: 1. A reliable verifier that checks each step of theorem proofs for logical consistency [10] 2. A generator that learns to self-improve by identifying and correcting issues during the proof generation process [11] 3. An evolving verification capability that adapts as the generator improves, focusing on difficult-to-verify proofs for further training [11] Industry Impact - The release of DeepSeekMath-V2 is seen as a strategic move in a competitive landscape, coinciding with releases from other major players like OpenAI and Google [10] - The open-source nature of the model under the Apache 2.0 license allows global developers to explore and fine-tune the gold medal-level model, breaking the monopoly of closed-source models in top-tier mathematical reasoning [10]
Gemini再揽金牌,力压大学学霸,AI数学推理时代来了
3 6 Ke· 2025-08-12 00:56
Core Insights - The performance of Gemini in the International Mathematics Competition (IMC) has been highlighted, showcasing its capabilities that exceed the gold medal threshold of the top 8% of participants [1][4][7] - The evaluation of AI models in mathematical reasoning tasks indicates that Gemini models, particularly Gemini Deep Think, demonstrate superior clarity and originality in their proofs compared to human participants [21][22][37] Group 1: Competition Overview - The IMC is organized by University College London and hosted by the American University in Bulgaria, scheduled for July 28 to August 3, 2025, targeting undergraduate students aged up to 23 [8][10] - The competition consists of two days, each featuring five problems worth 10 points each [10] Group 2: AI Model Performance - Three models were evaluated: Gemini Deep Think, Gemini-2.5-Pro, and Gemini-2.5-Pro Best-of-32, all achieving high scores well above the gold medal threshold [4][7] - Gemini Deep Think and Gemini Agent solved all problems with minimal errors, while Gemini Best-of-32 performed significantly better than its previous IMO results [5][7] Group 3: Evaluation Criteria and Results - The models were ranked based on the quality and clarity of their proofs, with Gemini Deep Think rated the highest, followed by Gemini Agent and then Gemini Best-of-32 [7][21] - Qualitative analysis revealed that Gemini Deep Think provided clearer and more engaging proofs, often employing innovative methods rather than relying solely on computational techniques [21][22] Group 4: Implications for AI in Mathematics - The increasing performance of AI in mathematical competitions suggests a growing capability in mathematical reasoning, with AI models able to tackle complex problems and provide novel proofs [37][43] - The results indicate that AI's strengths in computation and data processing may lead to fewer errors compared to human participants, raising questions about the future role of AI in mathematics [43]
计算机行业重大事项点评:DeepSeek-Prover-V2发布,专注数学推理
Huachuang Securities· 2025-05-04 09:28
Investment Rating - The report maintains a "Recommendation" rating for the computer industry, expecting the industry index to outperform the benchmark index by more than 5% in the next 3-6 months [20]. Core Insights - The release of DeepSeek-Prover-V2, a large language model focused on mathematical reasoning, marks a significant advancement in AI mathematics capabilities. The model features 671 billion parameters and employs a mixture of experts (MoE) architecture, enhancing its efficiency and performance in formal theorem proving [4][7]. - The report highlights the successful integration of formal and informal mathematical proofs, with DeepSeek-Prover-V2 achieving an 88.9% pass rate on the MiniF2F-test dataset, setting a new industry benchmark [7]. - Investment opportunities are identified in various sectors related to AI applications, including office software, finance, large models, industrial applications, medical technology, and more, suggesting a broad range of companies to consider for investment [7][8]. Industry Overview - The computer industry comprises 336 listed companies with a total market capitalization of 426.57 billion yuan and a circulating market value of 363.99 billion yuan [4]. - The absolute performance of the industry over the past 12 months has been 27.0%, with a relative performance of 22.4% compared to the benchmark index [5]. Related Research Reports - The report references several related studies, including a commentary on Alibaba's open-source Qwen3 model and a weekly report on the development trends in the intelligent driving industry [7].
刚刚!DeepSeek-Prover-V2-671B 发布,网友:DS 是假期终结者
程序员的那些事· 2025-05-01 02:04
Core Viewpoint - DeepSeek has launched DeepSeek-Prover-V2-671B, marking a significant advancement in AI mathematical reasoning capabilities, particularly in automated theorem proving [2][4]. Group 1: Model Overview - DeepSeek-Prover-V2-671B is a next-generation automated theorem proving expert model with 671 billion parameters, optimized for proof generation and verification in the Lean 4 framework [4][6]. - The model employs a mixture of experts (MoE) architecture, activating approximately 37 billion parameters per inference, enhancing computational efficiency while maintaining strong reasoning capabilities [4][6]. Group 2: Key Breakthroughs - The release signifies three major milestones, including the potential for innovation across various application domains [6]. - The model's specifications include a context length of approximately 128,000 tokens, allowing it to handle complex reasoning chains and lengthy proofs [6][7]. - The attention mechanism is likely a multi-head latent attention (MLA), which compresses key-value (KV) cache, significantly reducing memory requirements [6][7]. Group 3: Applications and Impact - The model supports formal verification in areas such as cryptographic security proofs and chip design validation, enabling rigorous mathematical checks in automated processes [7]. - It aids mathematicians in formalizing theorems, exploring new conjectures, and proving complex mathematical problems, potentially accelerating mathematical research [7]. - The model can be utilized as an interactive educational tool, guiding students in mastering rigorous mathematical proof methods [7].