Workflow
Lean
icon
Search documents
美版“梁文锋”不信邪
Hu Xiu· 2025-07-31 06:51
其中一个原因是,在模型进化的路上,即使是在一个窄小的领域,实现AI的无幻觉性能,都是一项困难的任务,初创公司很难有资源与模型大厂抗衡。 一家名叫Harmonic的初创公司偏不信邪,其正试图解决这个问题——开发完美无缺的零幻觉AI。 近日,这家公司推出了面向IOS和Android的聊天机器人应用程序测试版,普通用户可以通过这款程序,访问其人工智能模型Aristotle。其首席执行官兼联合 创始人Tudor Achim表示,Aristotle是人类可以进行推理并正式验证产出的第一款产品,在Aristotle支持的领域——定量推理,可以保证没有幻觉。同时, Harmonic还表示计划发布一个API,让企业访问Aristotle。在Harmonic的官网中宣称,其正在Github上公开发布Aristotle的完整证明,由于其经过正式验证, 无需人工检查,这使得Aristotle在前沿人工智能模型中,处于高级数学推理性能的最前沿。 Harmonic在新产品的宣传攻势中表示,Aristotle在第 66 届国际数学奥林匹克IMO2025中取得了金牌。这场比赛也被视为AI数学能力和AI推理能力的"成人 礼"。 "AI 原生 ...
大语言模型离“数学证明高手”还有多远?斯坦福、伯克利、MIT 团队提出 IneqMath 评测标准
AI前线· 2025-07-17 04:47
Core Viewpoint - The article discusses the limitations of large language models (LLMs) in mathematical reasoning, particularly in proving inequalities, and introduces a new framework called IneqMath to evaluate their reasoning capabilities [1][4][28]. Group 1: Challenges in Mathematical Reasoning - Current LLMs often provide seemingly correct answers but lack rigorous reasoning processes, raising questions about their true understanding of logical proofs [1][18]. - Formal systems like Lean and Coq can verify proofs but are complex and not easily scalable for intricate problems [1][4]. Group 2: IneqMath Framework - Researchers from Stanford, Berkeley, and MIT propose breaking down inequality proofs into two informal tasks: Bound Estimation and Relation Prediction, creating a bridge between natural language and formal logic [4][8]. - The IneqMath dataset consists of 1,252 training problems with detailed solutions and 200 test problems annotated by International Mathematical Olympiad gold medalists [8]. Group 3: Evaluation of Reasoning - An AI mathematical judging system was developed to assess the logical soundness of each reasoning step, achieving a high F1 score of 0.93, indicating strong agreement with human evaluations [15][17]. - The judging system includes various evaluators to check for logical gaps, numerical approximations, and computation accuracy [16]. Group 4: Model Performance Insights - Despite high answer accuracy, many models fail to provide logically sound reasoning, with Grok 3 mini showing only 6% of answers having a rigorous process [18][20]. - Larger models do not necessarily improve reasoning rigor, and simply increasing the number of tokens does not lead to significant enhancements in logical clarity [20][23]. Group 5: Effective Strategies for Improvement - Two effective methods identified are self-critique, which improves accuracy by about 5%, and theorem hints, which can enhance accuracy by up to 10% for complex problems [25]. - These findings suggest that improving reasoning in models requires more than just computational power; it involves teaching models to self-reflect and utilize tools effectively [25][28].
纯数学的突破可能需要几十年时间,人工智能正在尝试加快其速度
3 6 Ke· 2025-06-30 00:01
Core Insights - The article discusses the challenges artificial intelligence (AI) faces in advancing mathematical discoveries, particularly in complex reasoning tasks [1][6] - DARPA has initiated a new funding program aimed at recruiting researchers to collaborate with AI in high-level mathematical research, with the goal of accelerating breakthroughs in pure mathematics [1][2] - Experts believe that improving AI's capabilities in mathematics could have significant benefits for both mathematicians and society at large [1][2] Group 1: AI's Limitations in Mathematics - AI, such as OpenAI's ChatGPT, struggles with basic mathematical problems, highlighting inherent limitations in complex reasoning [1] - Patrick Shafto from DARPA emphasizes that overcoming these challenges could lead to more powerful AI systems, benefiting the mathematical community and society [1][2] - The article notes that pure mathematics has seen stagnation in published papers compared to explosive growth in life sciences and technology [4] Group 2: DARPA's Role and Historical Context - DARPA, known for its innovative contributions like ARPANET and advancements in drone technology, is now focusing on enhancing mathematical research through AI [2][3] - The agency's funding is seen as crucial during a time when research funding is being cut, as noted by mathematician Andrew Granville [3] - The historical context of DARPA's involvement in technological advancements underscores its potential impact on the future of mathematics [2][3] Group 3: The Future of AI in Mathematics - Experts like Jordan Ellenberg suggest that understanding AI's capabilities in generating mathematical insights will be crucial for future developments [6] - The article raises concerns about the lack of understanding of how AI operates, which is unprecedented in technological history [6] - There is a call for better communication and tools to facilitate the integration of AI in mathematical proofs, as current methods can be cumbersome [5][6]
陶哲轩:感谢Lean,我又重写了20年前经典教材!
机器之心· 2025-06-01 03:30
Core Viewpoint - Terence Tao has announced the creation of a Lean companion project for his undergraduate textbook "Analysis I," aiming to provide an alternative learning method through formalized mathematics using the Lean proof assistant [1][2]. Group 1: Project Overview - The Lean project will convert definitions, theorems, and exercises from "Analysis I" into Lean format, allowing students to engage with the material interactively [2][4]. - The project is intended to transition towards the standard Lean library Mathlib, which is one of the largest and most active formal mathematics projects globally [1][2]. Group 2: Educational Goals - "Analysis I" focuses on foundational topics such as the construction of natural numbers, integers, rational numbers, and real numbers, providing sufficient set theory and logic knowledge for rigorous proofs [2]. - The Lean project aims to enhance the learning experience by allowing students to complete exercises directly in Lean code, although official answers will not be provided [2][4]. Group 3: Structure and Content - The textbook consists of 11 chapters, with some chapters already formalized in Lean [3]. - The project maintains a deliberate strategy of partial independence from Mathlib, initially constructing certain mathematical structures independently before transitioning to Mathlib's definitions [5]. Group 4: Community Engagement - The Lean version of the textbook is now available for users, including mathematics students and researchers interested in formal verification, to engage with the material and provide feedback [7]. - Users have expressed excitement about the project, noting its potential to bridge the gap between traditional mathematics education and programming-based rigor [9].
当AI遇上数学:大语言模型如何掀起一场形式化数学的革命? | Deep Talk
锦秋集· 2025-05-12 09:13
Core Viewpoint - The article discusses the transformative impact of large language models (LLMs) on the field of mathematics, particularly through the integration of formalized mathematics methods, which enhance the accuracy and reliability of theorem proofs [1][4]. Group 1: Challenges and Opportunities - The increasing complexity of modern mathematical theories has surpassed the capacity of traditional peer review and manual verification methods, necessitating a shift towards formalized mathematics [4][6]. - The "hallucination" problem in LLMs, where models generate plausible but incorrect content, poses significant challenges in the highly logical domain of mathematics, highlighting the need for rigorous verification methods [6][7]. Group 2: Formalized Theorem Proving - Formalized theorem proving utilizes a system of axioms and logical reasoning rules to express mathematical statements in a verifiable format, allowing for high certainty in validation results [8][9]. - Successful applications of formalized methods in mathematics and software engineering demonstrate their potential to ensure consistency between implementation and specifications, overcoming the limitations of traditional methods [9]. Group 3: Recent Advances Driven by LLMs - Advanced LLMs like AlphaProof and DeepSeek-Prover V2 have shown remarkable performance in solving competitive-level mathematical problems, indicating significant progress in the field of formalized theorem proving [10]. - Research is evolving from mere proof generation to the accumulation of knowledge and the construction of theoretical frameworks, as seen in projects like LEGO-Prover [10]. Group 4: Transition to Proof Engineering Agents - The transition from static "Theorem Provers" to dynamic "Proof Engineering Agents" is essential for addressing high labor costs and low collaboration efficiency in formalized mathematics [11]. - APE-Bench has been developed to evaluate and promote the performance of language models in long-term dynamic maintenance scenarios, filling a gap in current assessment tools [12][16]. Group 5: Impact and Future Outlook - The integration of LLMs with formalized methods is expected to enhance verification efficiency in mathematics and industrial applications, leading to rapid advancements in mathematical knowledge [17]. - The long-term vision includes the emergence of "Certified AI," which combines formal verification with dynamic learning mechanisms, promising a new paradigm in knowledge production and decision-making [17].
陶哲轩油管首秀:33分钟,AI速证「人类需要写满一页纸」的证明
量子位· 2025-05-12 04:11
Core Viewpoint - The article discusses the significant achievement of mathematician Terence Tao in utilizing AI to complete a mathematical proof in just 33 minutes, showcasing the potential of AI in automating complex mathematical tasks [2][10][12]. Group 1: AI in Mathematical Proofs - Terence Tao demonstrated the use of AI to formalize a proof related to the Magma equation E1689, which traditionally required extensive manual effort [8][9]. - The process involved breaking down the proof into small logical units and using GitHub Copilot to generate code, which was then refined using Lean's canonical strategy [10][12]. - This method not only reduced the time taken for the proof but also ensured that the final output was human-readable, addressing concerns about the reliance on computer-generated proofs [12][15]. Group 2: Development of Proof Assistant - Tao announced the upgrade of his mathematical proof assistant to version 2.0, which is a lightweight tool developed in Python aimed at simplifying short and tedious proof tasks [23][24]. - The assistant has two modes: assumption mode and strategy mode, with the latter being the default, allowing users to apply various strategies to solve problems [28]. - The tool supports linear arithmetic and can guide users through the proof process, demonstrating its flexibility and potential for further development [32][38]. Group 3: Community Engagement and Future Plans - Tao expressed satisfaction with the proof assistant and welcomed suggestions for new features, indicating a collaborative approach to its development [38]. - Plans include creating tools for estimating symbolic function norms and deploying strategies for established mathematical inequalities, highlighting ongoing innovation in the field [38].