Workflow
形式化定理证明
icon
Search documents
模仿人类推理修正过程,阶跃星辰提出形式化证明新范式 | 开源
量子位· 2025-08-15 10:05
Core Viewpoint - The article discusses the release and open-sourcing of the formal theorem proving models StepFun-Prover-Preview-7B and StepFun-Prover-Preview-32B by the company, highlighting their advanced capabilities in formal proof generation and refinement through interactive learning [1][16]. Technical Highlights - StepFun-Prover employs a reinforcement learning training process based on environmental feedback, allowing the model to iteratively correct and improve formal proofs through real-time interaction [2]. - The two-stage supervised fine-tuning (SFT) strategy is utilized, where the first stage equips the model with basic tool usage capabilities [4]. - Tool-integrated reinforcement learning (RL) is implemented, where the model learns to generate outputs by utilizing Lean 4 data for code completion and understanding mathematical problem-solving [5]. - The iterative optimization method "RL-SFT-RL" enables the model to tackle increasingly difficult reasoning tasks, enhancing its performance over time [8]. Performance Metrics - The StepFun-Prover-Preview-32B achieved a pass@1 accuracy rate of 70.0% on the miniF2F-test benchmark, surpassing all known models by over 4% [9]. - The StepFun-Prover-Preview-7B also outperformed other models, including DeepSeek-Prover-V2-671B and Kimina-Prover-72B, with a pass@1 accuracy of 66.0% [10]. Case Studies - Case 1 demonstrates the model's ability to actively remove redundant steps in formal proofs, showcasing its natural language processing and feedback analysis capabilities [11]. - Case 2 illustrates how the model adjusts the structure of formal proofs based on timeout feedback, enhancing its adaptability [13]. - Case 3 highlights the model's effectiveness in correcting errors based on environmental feedback, further improving its reasoning robustness [12]. Future Directions - The StepFun-Prover Preview represents a significant milestone for the company in the field of formal proofs, with ongoing exploration in formal reasoning models anticipated [16].
当AI遇上数学:大语言模型如何掀起一场形式化数学的革命? | Deep Talk
锦秋集· 2025-05-12 09:13
Core Viewpoint - The article discusses the transformative impact of large language models (LLMs) on the field of mathematics, particularly through the integration of formalized mathematics methods, which enhance the accuracy and reliability of theorem proofs [1][4]. Group 1: Challenges and Opportunities - The increasing complexity of modern mathematical theories has surpassed the capacity of traditional peer review and manual verification methods, necessitating a shift towards formalized mathematics [4][6]. - The "hallucination" problem in LLMs, where models generate plausible but incorrect content, poses significant challenges in the highly logical domain of mathematics, highlighting the need for rigorous verification methods [6][7]. Group 2: Formalized Theorem Proving - Formalized theorem proving utilizes a system of axioms and logical reasoning rules to express mathematical statements in a verifiable format, allowing for high certainty in validation results [8][9]. - Successful applications of formalized methods in mathematics and software engineering demonstrate their potential to ensure consistency between implementation and specifications, overcoming the limitations of traditional methods [9]. Group 3: Recent Advances Driven by LLMs - Advanced LLMs like AlphaProof and DeepSeek-Prover V2 have shown remarkable performance in solving competitive-level mathematical problems, indicating significant progress in the field of formalized theorem proving [10]. - Research is evolving from mere proof generation to the accumulation of knowledge and the construction of theoretical frameworks, as seen in projects like LEGO-Prover [10]. Group 4: Transition to Proof Engineering Agents - The transition from static "Theorem Provers" to dynamic "Proof Engineering Agents" is essential for addressing high labor costs and low collaboration efficiency in formalized mathematics [11]. - APE-Bench has been developed to evaluate and promote the performance of language models in long-term dynamic maintenance scenarios, filling a gap in current assessment tools [12][16]. Group 5: Impact and Future Outlook - The integration of LLMs with formalized methods is expected to enhance verification efficiency in mathematics and industrial applications, leading to rapid advancements in mathematical knowledge [17]. - The long-term vision includes the emergence of "Certified AI," which combines formal verification with dynamic learning mechanisms, promising a new paradigm in knowledge production and decision-making [17].
DeepSeek开源Prover-V2强推理模型,网友:奥数从没这么简单过
机器之心· 2025-05-01 02:11
Core Insights - DeepSeek has released DeepSeek-Prover-V2, an open-source large language model specifically designed for formal theorem proving, achieving industry-leading performance in theorem proving tasks [1][3][4]. Model Overview - Two versions of DeepSeek-Prover-V2 have been released, with parameter sizes of 7 billion and 671 billion. The larger model is based on DeepSeek-V3-Base, while the smaller one is built on DeepSeek-Prover-V1.5-Base, supporting a maximum context length of 32,000 tokens [3][4]. - DeepSeek-Prover-V2 is tailored for the mathematical AI programming language Lean 4, focusing on formal theorem proving [3][4]. Technical Implementation - The model utilizes a recursive theorem proving process to generate cold-start training data, where DeepSeek-V3 decomposes complex problems into manageable sub-goals and formalizes the reasoning steps [9][11]. - The training process involves two phases: a non-CoT (non-Chain of Thought) mode for rapid formal proof generation and a CoT mode for detailed reasoning steps, enhancing transparency and logical progression [17][19]. Performance Metrics - The DeepSeek-Prover-V2-671B model achieved an 88.9% pass rate on the MiniF2F test and successfully solved 49 out of 658 problems in the PutnamBench dataset [15][23]. - The model's performance was evaluated against various benchmarks, demonstrating unprecedented accuracy and efficiency compared to other advanced models in the industry [20][23]. Dataset Release - DeepSeek has also introduced ProverBench, a benchmark dataset containing 325 problems, including 15 from recent AIME math competitions, aimed at comprehensive evaluation of models in high school and undergraduate mathematics [25][26].