模仿人类推理修正过程，阶跃星辰提出形式化证明新范式

Core Viewpoint - The article discusses the release and open-sourcing of the formal theorem proving models StepFun-Prover-Preview-7B and StepFun-Prover-Preview-32B by the company, highlighting their advanced capabilities in formal proof generation and refinement through interactive learning [1][16]. Technical Highlights - StepFun-Prover employs a reinforcement learning training process based on environmental feedback, allowing the model to iteratively correct and improve formal proofs through real-time interaction [2]. - The two-stage supervised fine-tuning (SFT) strategy is utilized, where the first stage equips the model with basic tool usage capabilities [4]. - Tool-integrated reinforcement learning (RL) is implemented, where the model learns to generate outputs by utilizing Lean 4 data for code completion and understanding mathematical problem-solving [5]. - The iterative optimization method "RL-SFT-RL" enables the model to tackle increasingly difficult reasoning tasks, enhancing its performance over time [8]. Performance Metrics - The StepFun-Prover-Preview-32B achieved a pass@1 accuracy rate of 70.0% on the miniF2F-test benchmark, surpassing all known models by over 4% [9]. - The StepFun-Prover-Preview-7B also outperformed other models, including DeepSeek-Prover-V2-671B and Kimina-Prover-72B, with a pass@1 accuracy of 66.0% [10]. Case Studies - Case 1 demonstrates the model's ability to actively remove redundant steps in formal proofs, showcasing its natural language processing and feedback analysis capabilities [11]. - Case 2 illustrates how the model adjusts the structure of formal proofs based on timeout feedback, enhancing its adaptability [13]. - Case 3 highlights the model's effectiveness in correcting errors based on environmental feedback, further improving its reasoning robustness [12]. Future Directions - The StepFun-Prover Preview represents a significant milestone for the company in the field of formal proofs, with ongoing exploration in formal reasoning models anticipated [16].