ProverBench - filings, earnings calls, financial reports, news

ProverBench

Search documents

机器之心· 2025-05-01 02:11

Core Insights - DeepSeek has released DeepSeek-Prover-V2, an open-source large language model specifically designed for formal theorem proving, achieving industry-leading performance in theorem proving tasks [1][3][4]. Model Overview - Two versions of DeepSeek-Prover-V2 have been released, with parameter sizes of 7 billion and 671 billion. The larger model is based on DeepSeek-V3-Base, while the smaller one is built on DeepSeek-Prover-V1.5-Base, supporting a maximum context length of 32,000 tokens [3][4]. - DeepSeek-Prover-V2 is tailored for the mathematical AI programming language Lean 4, focusing on formal theorem proving [3][4]. Technical Implementation - The model utilizes a recursive theorem proving process to generate cold-start training data, where DeepSeek-V3 decomposes complex problems into manageable sub-goals and formalizes the reasoning steps [9][11]. - The training process involves two phases: a non-CoT (non-Chain of Thought) mode for rapid formal proof generation and a CoT mode for detailed reasoning steps, enhancing transparency and logical progression [17][19]. Performance Metrics - The DeepSeek-Prover-V2-671B model achieved an 88.9% pass rate on the MiniF2F test and successfully solved 49 out of 658 problems in the PutnamBench dataset [15][23]. - The model's performance was evaluated against various benchmarks, demonstrating unprecedented accuracy and efficiency compared to other advanced models in the industry [20][23]. Dataset Release - DeepSeek has also introduced ProverBench, a benchmark dataset containing 325 problems, including 15 from recent AIME math competitions, aimed at comprehensive evaluation of models in high school and undergraduate mathematics [25][26].

形式化定理证明

数学AI编程语言Lean 4

Artificial Intelligence

Artificial Intelligence

DeepSeek-Prover-V2

ProverBench