北大华为联队夺冠:形式化数学竞赛33支队伍角逐,国产大模型啃下形式化证明硬骨头
量子位·2025-12-20 06:30

Core Insights - The article discusses a breakthrough in AI mathematical reasoning achieved by a team named "Lean说的都队" during the CCF formalized mathematics competition, where they emerged as champions among 33 teams [1][2]. Group 1: Competition Overview - The competition, organized by the China Computer Federation and supported by various institutions, aimed to address the core issues of "hallucination" and unreliability in large models during mathematical reasoning [2]. - The competition required models to convert natural language mathematical problems into formal proof code without any natural language explanations, effectively making AI act as both mathematicians and programmers [4]. Group 2: Team Performance - "Lean说的都队" demonstrated exceptional capabilities, answering 181 out of 220 questions correctly in the preliminary round, scoring 82.27 points, and solving 5 out of 50 difficult problems in the finals with a score of 10 points, leading to a total score of 57.21, placing them first [6]. - The team consisted of members from Peking University, including Yuan Ye, Liu Chengwu, Li Botao, Xie Jiaxuan, and Li Siqi, guided by Professor Zhang Ming [6]. Group 3: Technical Innovations - The team utilized the Huawei openPangu-Ultra-MoE-718B model, a large-scale mixed expert language model with 718 billion parameters, which demonstrated strong performance in formal mathematical reasoning tasks [9]. - The model's architecture includes advanced features such as Multi-head Latent Attention and Depth-Scaled Sandwich-Norm, enhancing its ability to handle abstract mathematical concepts [9]. Group 4: Methodology and Mechanisms - The team introduced a collaborative solving system that combines the reasoning capabilities of the openPangu model with the efficiency of specialized provers [7]. - They implemented a dynamic switching strategy and a multi-layer quality assurance system to ensure the correctness and semantic alignment of proofs [13][14]. Group 5: Semantic Verification Breakthrough - A significant innovation was the introduction of a semantic decomposition verification mechanism, which breaks down natural language problems into data types, premises, and proof goals, improving the reliability of formal results [16][19]. - This approach addresses the issue of overly lenient judgments in traditional methods, significantly reducing the error rate in formal proofs [19]. Group 6: Practical Applications - The team showcased their model's adaptability through two case studies: one involving abstract algebra and another on complex number calculations, demonstrating the model's ability to generate rigorous formal proofs [20][22]. Group 7: Challenges and Future Directions - Despite the progress, the team acknowledged limitations in the current system, particularly in handling advanced mathematics topics and the average solving time of one hour per problem [23]. - Future recommendations include developing specialized provers through active learning, exploring dynamic sampling strategies, and fostering human-AI collaboration in proof processes [23]. Group 8: Conclusion - The achievements of the Peking University and Huawei team mark a significant milestone for China in the field of AI formalized reasoning, providing a viable technical pathway for tackling rigorous mathematical proofs [31].