Workflow
形式化数学推理
icon
Search documents
字节Seed发布最强数学模型:一招“打草稿”,IMO银牌变金牌
量子位· 2025-12-25 06:08
Core Insights - ByteDance's latest mathematical reasoning model, Seed Prover 1.5, achieved a gold medal score at the IMO 2025 by solving five problems in 16.5 hours, scoring 35 points, which meets the gold medal threshold for this year [1][3] - This performance matches that of Google's Gemini, which was certified as an IMO gold medalist in July [3] - The model has not been open-sourced yet, but a technical report has been released, highlighting the performance improvements brought by large-scale reinforcement learning [5][19] Model Performance - Seed Prover 1.5 significantly outperformed its predecessor, which took three days to solve four out of six problems and achieved a silver medal [3] - The model also set new state-of-the-art (SOTA) records in the North American undergraduate mathematics competition, Putnam [4] Technical Innovations - The model features a new architecture called Agentic Prover, which allows it to use formal mathematical reasoning instead of natural language, ensuring more reliable results [10][12] - It incorporates a Sketch Model that simulates how human mathematicians draft proofs, breaking down complex problems into manageable sub-goals [22][23] - The model employs a multi-agent collaborative system that enhances efficiency and success rates by recursively calling the Sketch Model for difficult lemmas [25][28] Reinforcement Learning and Efficiency - The model's proof success rate improved from 50% to nearly 90% with increased reinforcement learning training steps [19] - In comparative tests, Seed Prover 1.5 required significantly less computational resources while outperforming previous models on high-difficulty datasets [19][20] Conclusion - The research is part of ByteDance's Seed AI4Math team, showcasing advancements in mathematical reasoning through innovative model architectures and training methodologies [30]
字节推出形式化数学推理专用模型SeedProver1.5
Cai Jing Wang· 2025-12-24 07:03
12月24日,字节跳动Seed团队宣布推出新一代形式化数学推理专用模型Seed Prover1.5,通过大规模的 Agentic RL训练,其推理能力和推理效率宣称取得显著进步。据悉,Seed Prover1.5的技术报告已对外公 开,后续将开放API。(字节跳动seed) ...
字节跳动Seed团队推出形式化数学推理专用模型Seed Prover 1.5
智通财经网· 2025-12-24 06:16
Core Insights - ByteDance's Seed team announced the launch of Seed Prover 1.5, a specialized model for formal mathematical reasoning, which claims significant improvements in reasoning capability and efficiency through large-scale Agentic RL training [1] Performance Metrics - Seed Prover 1.5 generated complete, compilable verification Lean proof code for the first five problems of IMO 2025 in 16.5 hours, achieving a score of 35 out of 42, which meets the gold medal threshold of the previous IMO scoring standard [1] - For the North American undergraduate mathematics competition Putnam, Seed Prover 1.5 took 9 hours to generate compilable verification Lean code for 11 out of 12 problems from the Putnam 2025 competition [1] Evaluation Results - In a comprehensive evaluation, Seed Prover 1.5 solved 88% of problems in the complete Putnam historical evaluation set, 80% in the Fate-H set representing master's level difficulty, and 33% in the Fate-X set representing doctoral level difficulty, setting new state-of-the-art (SOTA) performance for formal mathematical reasoning models in these evaluation sets [1] Future Developments - The technical report for Seed Prover 1.5 has been made public, and an API will be opened for interested mathematics and AI researchers to experience the model [1]
达到金牌分数线:字节跳动推出新一代数学推理专用模型Seed Prover 1.5
Feng Huang Wang· 2025-12-24 04:34
Core Insights - ByteDance's Seed team has launched a new formal mathematical reasoning model, Seed Prover 1.5, which shows improved capabilities in formal proofs for mathematical competition problems [1] Group 1: Model Performance - The model generated complete compilable verification code for the first five problems of IMO 2025 in 16.5 hours, achieving a score that meets the previous gold medal threshold [1] - In the Putnam 2025 competition, the model produced verifiable code for 11 out of 12 problems in 9 hours [1] - The model solved 88% of the problems in the historical evaluation set of Putnam [1] Group 2: Model Limitations and Future Plans - The current model is primarily focused on competition problems that have "clear rules and closed backgrounds," indicating limitations in addressing complex mathematical research that requires long chains of reasoning and literature dependencies [1] - A technical report has been made public, and an API will be opened for researchers to experience the model [1]
字节推出形式化数学推理专用模型Seed Prover 1.5
Xin Lang Cai Jing· 2025-12-24 04:23
Core Insights - The core viewpoint of the article is the announcement by ByteDance's Seed team regarding the launch of the Seed Prover 1.5 model, which demonstrates significant advancements in formal mathematical reasoning capabilities compared to its predecessor [1] Group 1: Product Launch - ByteDance's Seed team introduced the Seed Prover 1.5 model on December 24 [1] - The new model generated complete compilable verification Lean proof code for the first five problems of IMO 2025 within 16.5 hours [1] - The performance score of Seed Prover 1.5 was 35 out of 42, meeting the gold medal score threshold of the previous IMO scoring standards [1] Group 2: Technical Developments - The technical report for Seed Prover 1.5 has been made publicly available [1] - An API will be opened for interested mathematics and AI researchers to experience the model [1]
北大华为联队夺冠:形式化数学竞赛33支队伍角逐,国产大模型啃下形式化证明硬骨头
量子位· 2025-12-20 06:30
Core Insights - The article discusses a breakthrough in AI mathematical reasoning achieved by a team named "Lean说的都队" during the CCF formalized mathematics competition, where they emerged as champions among 33 teams [1][2]. Group 1: Competition Overview - The competition, organized by the China Computer Federation and supported by various institutions, aimed to address the core issues of "hallucination" and unreliability in large models during mathematical reasoning [2]. - The competition required models to convert natural language mathematical problems into formal proof code without any natural language explanations, effectively making AI act as both mathematicians and programmers [4]. Group 2: Team Performance - "Lean说的都队" demonstrated exceptional capabilities, answering 181 out of 220 questions correctly in the preliminary round, scoring 82.27 points, and solving 5 out of 50 difficult problems in the finals with a score of 10 points, leading to a total score of 57.21, placing them first [6]. - The team consisted of members from Peking University, including Yuan Ye, Liu Chengwu, Li Botao, Xie Jiaxuan, and Li Siqi, guided by Professor Zhang Ming [6]. Group 3: Technical Innovations - The team utilized the Huawei openPangu-Ultra-MoE-718B model, a large-scale mixed expert language model with 718 billion parameters, which demonstrated strong performance in formal mathematical reasoning tasks [9]. - The model's architecture includes advanced features such as Multi-head Latent Attention and Depth-Scaled Sandwich-Norm, enhancing its ability to handle abstract mathematical concepts [9]. Group 4: Methodology and Mechanisms - The team introduced a collaborative solving system that combines the reasoning capabilities of the openPangu model with the efficiency of specialized provers [7]. - They implemented a dynamic switching strategy and a multi-layer quality assurance system to ensure the correctness and semantic alignment of proofs [13][14]. Group 5: Semantic Verification Breakthrough - A significant innovation was the introduction of a semantic decomposition verification mechanism, which breaks down natural language problems into data types, premises, and proof goals, improving the reliability of formal results [16][19]. - This approach addresses the issue of overly lenient judgments in traditional methods, significantly reducing the error rate in formal proofs [19]. Group 6: Practical Applications - The team showcased their model's adaptability through two case studies: one involving abstract algebra and another on complex number calculations, demonstrating the model's ability to generate rigorous formal proofs [20][22]. Group 7: Challenges and Future Directions - Despite the progress, the team acknowledged limitations in the current system, particularly in handling advanced mathematics topics and the average solving time of one hour per problem [23]. - Future recommendations include developing specialized provers through active learning, exploring dynamic sampling strategies, and fostering human-AI collaboration in proof processes [23]. Group 8: Conclusion - The achievements of the Peking University and Huawei team mark a significant milestone for China in the field of AI formalized reasoning, providing a viable technical pathway for tackling rigorous mathematical proofs [31].
这才是IMO奥赛战神:满分,5战3金,刚被MIT录取
机器之心· 2025-07-23 10:36
Core Viewpoint - The article highlights the impressive performance of AI models, particularly the Seed Prover from ByteDance, in the International Mathematical Olympiad (IMO), alongside the remarkable achievements of human contestant Warren Bei, who scored a perfect 42/42, showcasing the intersection of AI and human intelligence in mathematics [3][4][5]. Group 1: AI Performance - The Seed Prover model from ByteDance successfully solved 4 out of 6 problems in the IMO, achieving a score of 30 points, which is recognized as a silver medal performance [4]. - The article emphasizes the growing interest and advancements in AI's capabilities in formal mathematical reasoning, particularly in competitive environments like the IMO [3][4]. Group 2: Warren Bei's Achievements - Warren Bei, an 11th-grade student from Canada, achieved a perfect score of 42/42 at the IMO, a feat accomplished by only five contestants globally this year [5][6]. - His journey in mathematics includes five years of participation in the IMO, culminating in three gold medals and two silver medals, reflecting consistent improvement and dedication [9][15]. - Warren's accolades also include winning the Canadian Mathematics Olympiad (CMO) multiple times, starting from a young age, which has established him as a prominent figure in the mathematics community [16][17]. Group 3: Personal Insights and Future Aspirations - Warren Bei expresses a passion for mathematics, stating that the joy lies in the process of problem-solving rather than the awards themselves [18]. - He maintains an open attitude towards his future, considering various academic paths while emphasizing the importance of understanding the practical applications of mathematics [12][13]. - His approach to challenges in mathematics is philosophical, focusing on intuition and perseverance as key to overcoming difficulties [19].
挑战AI数学推理极限!大规模形式化数学基准FormalMATH发布,最强模型成功率仅16%
量子位· 2025-05-07 09:33
Core Insights - The FormalMATH benchmark test, developed by institutions such as The Chinese University of Hong Kong and Zhejiang University, consists of 5,560 rigorously validated mathematical problems, covering various fields from Olympiad level to undergraduate courses, and is 22.8 times larger than existing benchmarks [1][5][4]. Group 1: Performance of LLMs - The performance of current LLM-driven theorem provers is significantly below expectations, with the best model, Kimina-Prover, achieving a success rate of only 16.46% under resource constraints [3][15]. - Most models perform close to random guessing in calculus and other areas, indicating a substantial capability gap [3][7]. - There is a notable domain bias, with better performance in algebra compared to weaker results in calculus [11][12]. Group 2: Error Analysis - Common error patterns include: - Redundant assumptions (34%): Introducing irrelevant premises [16]. - Incomplete proofs (62%): Missing critical steps in the proof [16]. - Misuse of automation strategies (65%): Incorrectly applying automated tools [16]. - Inability to handle inequalities correctly (13%): Over-reliance on automated inequality calculation strategies [16]. - The analysis shows that LLM provers often resort to shortcut tactics, which leads to significant errors [14]. Group 3: Future Directions - To enhance the formal reasoning capabilities of LLMs, three areas of focus are proposed: - Strengthening multi-step planning to reduce reliance on single-step tactics [19]. - Cross-domain generalization through curriculum learning to balance training data across different mathematical fields [19]. - Development of interactive proof-assistance tools for collaboration between LLMs and human experts [19]. Group 4: Open Source Initiative - The research team has made the FormalMATH benchmark's code, training data, and evaluation models publicly available, encouraging collaboration between academia and industry to advance formal mathematical reasoning technologies [20][21].
AI的下一个风口?听前DeepSeek成员辛华剑解读数学推理 | Deep Talk
锦秋集· 2025-05-03 08:51
Core Viewpoint - DeepSeek has released a new model named DeepSeek-Prover-V2-671B, which focuses on formal mathematical reasoning, addressing a significant challenge in AI and opening up high-value commercial opportunities [1][2]. Group 1: Model Development and Impact - DeepSeek-Prover series models combine the generalization capabilities of large language models (LLMs) with formal tools like Lean, achieving large-scale end-to-end conversion from natural language descriptions to machine-verifiable proofs [2]. - This breakthrough could potentially enhance the efficiency of mathematical research several times over and create new possibilities for AI applications in fields that require mathematical rigor, such as financial modeling, chip verification, and cryptography [2]. Group 2: Event Information - A cross-ocean dialogue event will take place on May 9, 2025, featuring DeepSeek's former member Xin Huajian, who will discuss the formal mathematical revolution in the era of large language models [3][4]. - The event will also include a presentation by Zang Tianyu from Jinqiu Capital on AI investment trends for 2025 [3][4]. Group 3: Organizers and Participants - Jinqiu Capital focuses on AI investments and has a 12-year long-term fund, actively supporting early-stage entrepreneurs with a strategy of aggressive follow-on investments [6]. - The Cambridge China AI Association aims to connect the Chinese AI industry with global academia and industry, facilitating efficient resource flow between China and the UK [7].