DeepSeek新数学模型刷爆记录!7B小模型自主发现671B模型不会的新技能
量子位·2025-05-01 03:53

Core Insights - DeepSeek has launched a new model, DeepSeek-Prover-V2, which significantly improves performance in mathematical theorem proving, achieving a record of solving 49 problems in the Putnam test [2][36]. - The model demonstrates unique capabilities, particularly in solving problems that previous models, including a larger 671B model, could not address [9][10]. Model Development - DeepSeek-Prover series has evolved through several iterations: Prover-V1, Prover-V1.5, and now Prover-V2, with each version introducing enhancements in methodology and model architecture [11][12][14]. - Prover-V2 integrates a unified approach for formal and informal mathematical proofs, utilizing a more advanced base model, DeepSeek-V3, which enhances context handling and natural language reasoning [15][18]. Training Methodology - The training process for Prover-V2 involves a two-phase approach, focusing on both non-CoT and CoT (Chain of Thought) generation modes to improve reasoning capabilities [28][29]. - The model employs a unique reinforcement learning strategy, utilizing a binary reward system based on the correctness of generated proofs, which enhances its ability to connect informal reasoning with formal proof construction [32][33]. Performance Metrics - Prover-V2 achieved an impressive pass rate of 88.9% on the miniF2F test and solved 49 problems in the Putnam test, showcasing its advanced capabilities compared to previous models [36][40]. - The model's performance is further validated through a benchmark dataset, ProverBench, which includes 325 formalized problems from various mathematical domains [38][39]. Community Response - The release of Prover-V2 has garnered significant attention within the research community, with rapid engagement on platforms like GitHub and social media, indicating strong interest and validation of its contributions to formal mathematics [51][57]. - Notable figures in the field have praised the advancements made by DeepSeek, highlighting the competitive landscape of formal mathematics and the model's impact on the state-of-the-art [59].

DeepSeek新数学模型刷爆记录!7B小模型自主发现671B模型不会的新技能 - Reportify