形式化推理 - filings, earnings calls, financial reports, news

形式化推理

Search documents

Ke Ji Ri Bao· 2025-11-13 01:00

深度思维2004年曾透露其混合AI系统在同年的IMO竞赛中表现优异，仅差1分就能摘得金牌。而今正式发布论文推出并详解该AI系统。【总编辑圈点】这一突破被认为是AI研究领域的又一个里程碑。这是因为用高水平竞赛题目测试AI系统，已成为评估其逻辑推理、抽象思维和解决问题能力的重要标准。这类题目不仅要求严密的演绎推理，还涉及创造性策略和跨领域知识整合，远超普通问答或模式识别任务。因此，能否在IMO等权威竞赛中取得好成绩，被视为衡量AI是否具备"类人"深度推理能力的关键试金石。数学家长期以来依赖计算工具辅助解决复杂问题和构建严谨证明，而AI有望加速这一过程。现在，AI在形式化推理领域迈出了关键一步，不同于依赖模糊语言模型的通用AI，最新成果在严格逻辑框架中运行，其每一步推理均可验证，极大提升了结果的可靠性。此举不仅突破了AI推理的局限，也为探索复杂数学猜想提供了新工具，更为未来人机协作攻克前沿科学难题开辟了现实路径。其影响将辐射至理论计算机科学、自动定理证明乃至基础数学研究等领域。科技日报北京11月12日电（记者张梦然）《自然》杂志12日发表了一项重要成果：英国深度思维正式推出其开发的"数学 ...

普林斯顿团队领衔发布最强开源数学定理证明模型：32B性能大幅超越前代SOTA DeepSeek 671B

机器之心· 2025-07-17 05:03

Core Insights - The article discusses the launch of Goedel-Prover-V2, a new open-source mathematical theorem proving model led by Princeton University in collaboration with several top institutions, including Tsinghua University and Stanford University. The model significantly outperforms previous state-of-the-art models in various benchmarks [1][10]. Performance Highlights - The 32B flagship model achieved an 8.0% improvement in Pass@32 accuracy on the MiniF2F test compared to the previous SOTA model, DeepSeek-Prover-V2-671B [6]. - The 8B model demonstrated performance on par with the 671B SOTA model, showcasing efficiency and capability breakthroughs [7][22]. - Goedel-Prover-V2 ranked first on the challenging PutnamBench, solving 64 problems with a Pass@64 metric, outperforming DeepSeek-Prover-V2-671B, which solved 47 problems at Pass@1024 [9][14][20]. Technical Innovations - The development process of Goedel-Prover-V2 incorporates expert iteration and reinforcement learning, along with three key innovations: - Model averaging enhances robustness and overall performance by integrating model weights from different training nodes [12][32]. - Scaffolded data synthesis allows for the automatic generation of progressively challenging proof tasks, facilitating smoother training [13][26]. - Verifier-guided self-correction enables the model to iteratively refine its proofs using feedback from the Lean compiler, simulating human-like self-correction [13][32]. Benchmark Results - In the MiniF2F test, the 8B model achieved a Pass@32 rate of 83.3%, surpassing the performance of the 671B SOTA model [12]. - The flagship model reached Pass@32 rates of 88.1% in standard mode and 90.4% in self-correction mode, significantly exceeding previous models [12]. - The performance of Goedel-Prover-V2-32B remained consistently superior across various reasoning sampling budgets compared to earlier models [21][22]. Model and Dataset Availability - The Goedel-Prover-V2 model and the new MathOlympiadBench benchmark dataset have been publicly released to support research in the field [28][30]. - MathOlympiadBench includes 360 formalized problems from international mathematics competitions, aimed at enhancing preparation for events like the International Math Olympiad [30][31].

DeepSeek-Prover-V2-671B

DeepSeek-Prover-V2-671B