Workflow
Lean
icon
Search documents
大模型开始“批量破解”数学难题
Hua Er Jie Jian Wen· 2026-01-15 07:08
Core Insights - The breakthrough in artificial intelligence (AI) within the field of mathematics is accelerating, with 15 out of over 1000 unsolved problems left by mathematician Paul Erdős being solved since Christmas, 11 of which involved AI models [1] - OpenAI's latest GPT 5.2 model has shown significant improvements in mathematical reasoning, capable of providing complete proofs in 15 minutes, surpassing previous versions [1] - AI models have made substantial autonomous progress on 8 different Erdős problems, indicating a shift in the role of AI from an assistant to an independent problem solver [1][3] Impact on Mathematical Research and AI Application Market - The advancements in AI are transforming the academic research workflow, with formal tools like Lean and Aristotle being widely adopted by top mathematicians and computer science professors [2] - The increase in the number of solved Erdős problems is attributed to the serious engagement of top mathematicians with these AI tools, marking a shift from experimental to mainstream academic application [6] Systematic Breakthroughs and Discoveries - The discovery by Neel Somani began with a routine test of ChatGPT, which provided a complete answer to a mathematical problem, demonstrating the model's ability to reference established mathematical principles [3] - The Erdős problem set, containing over 1000 conjectures, has become an attractive target for AI-driven mathematical research, with GPT 5.2 outperforming previous models in advanced mathematics [3] Cautious Evaluation by Leading Mathematicians - Mathematician Terence Tao suggests that AI systems are better suited for systematically addressing lesser-known Erdős problems, which may now be more likely solved through pure AI methods rather than human or hybrid approaches [4] - This evaluation indicates a potential reallocation of resources in mathematical research, with AI efficiently handling medium-difficulty problems that have been overlooked due to human limitations [4] Formalization Tools Driving Application - The recent shift towards formalization in mathematics is a key driver, making mathematical reasoning easier to verify and extend, with new automation tools significantly reducing the workload [5] - Tools like Lean and Aristotle promise to automate much of the formalization work, enhancing the efficiency of mathematical research [5]
AI又要颠覆数学?陶哲轩紧急发声:停止造神
3 6 Ke· 2026-01-12 01:49
Core Viewpoint - The article discusses the exaggerated claims regarding AI's ability to solve complex mathematical problems, particularly in relation to Erdős problems, and emphasizes the need for a more nuanced understanding of AI's contributions in mathematics [1][2]. Group 1: AI's Capabilities in Mathematics - AI's achievements in solving certain mathematical problems are often overstated, leading to misconceptions that AI can independently innovate or replace human mathematicians [2][4]. - The difficulty level of problems solved by AI varies significantly, making direct comparisons misleading; some problems are much easier than others, which can skew perceptions of AI's capabilities [2][3]. - Many problems labeled as "unsolved" may have been previously addressed in literature, leading to potential misattributions of "first solutions" to AI [3][10]. Group 2: Evaluation of AI Contributions - AI's contributions can be categorized into several types, including generating complete or partial solutions, conducting literature reviews, and formalizing proofs [6][12]. - Specific examples illustrate that AI has successfully provided solutions for certain problems, but these often require validation against existing literature to confirm their novelty [8][10]. - The process of formalizing AI-generated proofs can introduce risks, such as the potential for misinterpretation or the introduction of unverified axioms [4][12]. Group 3: The Role of Human Mathematicians - Human mathematicians remain essential for formulating deep questions, creating new concepts, and integrating results into the broader knowledge network of mathematics [12]. - The future of mathematics may involve a collaborative relationship where humans guide AI in exploring mathematical landscapes, rather than AI acting as an independent entity [12].
陶哲轩:AI让数学进入「工业化」时代,数学家也可以是「包工头」
机器之心· 2026-01-03 01:35
Core Insights - The article discusses the transformation of mathematical research, driven by AI and formal proof languages like Lean, moving away from traditional methods towards a more industrialized approach [1][2]. Group 1: Transformation of Mathematical Research - Terry Tao highlights that traditional mathematical research is facing a paradigm shift due to the integration of AI and formal proof systems, which reduce repetitive tasks and enhance collaboration [2][5]. - The use of large language models (LLMs) and automated formalization is making tedious tasks easier, allowing mathematicians to focus on more complex problems [2][9]. - The modularization of research is expected to enable non-experts, or "citizen mathematicians," to contribute to advanced research, thereby accelerating progress in the field [2][29]. Group 2: Changes in Collaboration and Roles - The article suggests that the future of mathematics may resemble software engineering, with roles such as "architects" or project managers emerging to oversee large collaborative projects [2][23]. - Tao emphasizes the importance of collaboration, noting that the traditional model of individual research is insufficient for the complexity of modern mathematical problems [25][26]. - The integration of formal tools and AI is expected to facilitate seamless collaboration among individuals with varying skill sets, allowing for a more efficient division of labor in mathematical research [27][28]. Group 3: Impact of Formalization on Mathematical Thinking - Formalization is changing the way mathematicians think, helping them identify implicit assumptions and refine their definitions, which leads to clearer and more concise writing [10][12]. - The process of formalization encourages a new style of proof writing that is more modular and easier to understand, contrasting with traditional linear proofs [12][13]. - Tao notes that formalization allows for a more precise understanding of the applicability of mathematical tools, potentially leading to breakthroughs in various areas [15][16]. Group 4: Future of Mathematical Research - The article predicts a future where the role of mathematicians will expand to include project management and coordination of large-scale research efforts, rather than solely focusing on individual contributions [29][30]. - As tools and collaboration methods evolve, the barriers to entry for participating in mathematical research are expected to decrease, allowing a broader range of individuals to engage in the field [30][31]. - The potential for AI to handle repetitive tasks in mathematical research is seen as a way to unlock new levels of productivity and creativity among mathematicians [32][34].
十分钟出结果,陶哲轩用Gemini Deepthink帮人类数学家完成Erdős问题论证
机器之心· 2025-11-23 04:06
Core Viewpoint - The article discusses the Erdős Problems website, which focuses on mathematical research and problem-solving, particularly related to the famous mathematician Paul Erdős. It serves as a platform for researchers and enthusiasts to propose, discuss, and solve various mathematical problems across different fields such as number theory, combinatorics, and graph theory [1]. Group 1 - The Erdős Problems website collects various mathematical problems proposed by Erdős, covering diverse areas like number theory, combinatorics, and graph theory [1]. - Independent researcher Wouter van Doorn provided a counterexample to Erdős Problem 367, relying on a congruence identity he believes to be valid [5]. - The problem was later submitted to Gemini 2.5 Deep Think by renowned mathematician Terence Tao, who received a complete proof from the AI in about ten minutes [9]. Group 2 - Terence Tao manually converted the AI-generated proof into a more basic form within half an hour, indicating that the proof could be formalized and verified in Lean [11]. - Two days later, mathematician Boris Alexeev used the Harmonic Aristotle tool to complete the Lean formalization of the problem, taking two to three hours for the process [12]. - Terence Tao has been exploring the application of AI tools in mathematics, contributing to various research and proofs, including a recent paper on the topic [13].
美版“梁文锋”不信邪
Hu Xiu· 2025-07-31 06:51
Core Viewpoint - The article discusses the emergence of Harmonic, a startup focused on developing a zero-hallucination AI model named Aristotle, which aims to excel in mathematical reasoning and formal verification, attracting significant investment and attention in the AI industry [2][5][46]. Group 1: Company Overview - Harmonic is a two-year-old startup that has rapidly gained attention from top-tier investment firms, achieving a valuation close to $900 million [5][23]. - The company has attracted nearly $200 million in investments from prominent firms such as Sequoia Capital, Kleiner Perkins, and Paradigm [5][29][27]. - Founders Vlad Tenev and Tudor Achim bring unique backgrounds in mathematics and AI, respectively, with Tenev being the CEO of Robinhood and Achim having experience in autonomous driving [11][12][16]. Group 2: Product Development - Harmonic's flagship product, Aristotle, is designed to perform mathematical reasoning without hallucinations, utilizing a formal verification tool called Lean [18][30]. - Aristotle has demonstrated impressive performance in mathematical problem-solving, achieving a success rate of 90% in the MiniF2F test, significantly outperforming existing models like OpenAI's GPT-4 [37][38]. - The model addresses three main issues: hallucination, unclear reasoning processes, and lack of rigor in traditional AI models [19][20][21]. Group 3: Market Context - The AI industry is increasingly recognizing the need for rigorous reasoning capabilities, creating opportunities for startups like Harmonic [25][24]. - Competitors in the space include DeepSeek and Google DeepMind, both of which are also developing advanced mathematical AI models [40][45]. - The competitive landscape is intensifying as major players seek to enhance their AI models' reasoning capabilities, particularly in high-stakes applications [26][46].
大语言模型离“数学证明高手”还有多远?斯坦福、伯克利、MIT 团队提出 IneqMath 评测标准
AI前线· 2025-07-17 04:47
Core Viewpoint - The article discusses the limitations of large language models (LLMs) in mathematical reasoning, particularly in proving inequalities, and introduces a new framework called IneqMath to evaluate their reasoning capabilities [1][4][28]. Group 1: Challenges in Mathematical Reasoning - Current LLMs often provide seemingly correct answers but lack rigorous reasoning processes, raising questions about their true understanding of logical proofs [1][18]. - Formal systems like Lean and Coq can verify proofs but are complex and not easily scalable for intricate problems [1][4]. Group 2: IneqMath Framework - Researchers from Stanford, Berkeley, and MIT propose breaking down inequality proofs into two informal tasks: Bound Estimation and Relation Prediction, creating a bridge between natural language and formal logic [4][8]. - The IneqMath dataset consists of 1,252 training problems with detailed solutions and 200 test problems annotated by International Mathematical Olympiad gold medalists [8]. Group 3: Evaluation of Reasoning - An AI mathematical judging system was developed to assess the logical soundness of each reasoning step, achieving a high F1 score of 0.93, indicating strong agreement with human evaluations [15][17]. - The judging system includes various evaluators to check for logical gaps, numerical approximations, and computation accuracy [16]. Group 4: Model Performance Insights - Despite high answer accuracy, many models fail to provide logically sound reasoning, with Grok 3 mini showing only 6% of answers having a rigorous process [18][20]. - Larger models do not necessarily improve reasoning rigor, and simply increasing the number of tokens does not lead to significant enhancements in logical clarity [20][23]. Group 5: Effective Strategies for Improvement - Two effective methods identified are self-critique, which improves accuracy by about 5%, and theorem hints, which can enhance accuracy by up to 10% for complex problems [25]. - These findings suggest that improving reasoning in models requires more than just computational power; it involves teaching models to self-reflect and utilize tools effectively [25][28].
纯数学的突破可能需要几十年时间,人工智能正在尝试加快其速度
3 6 Ke· 2025-06-30 00:01
Core Insights - The article discusses the challenges artificial intelligence (AI) faces in advancing mathematical discoveries, particularly in complex reasoning tasks [1][6] - DARPA has initiated a new funding program aimed at recruiting researchers to collaborate with AI in high-level mathematical research, with the goal of accelerating breakthroughs in pure mathematics [1][2] - Experts believe that improving AI's capabilities in mathematics could have significant benefits for both mathematicians and society at large [1][2] Group 1: AI's Limitations in Mathematics - AI, such as OpenAI's ChatGPT, struggles with basic mathematical problems, highlighting inherent limitations in complex reasoning [1] - Patrick Shafto from DARPA emphasizes that overcoming these challenges could lead to more powerful AI systems, benefiting the mathematical community and society [1][2] - The article notes that pure mathematics has seen stagnation in published papers compared to explosive growth in life sciences and technology [4] Group 2: DARPA's Role and Historical Context - DARPA, known for its innovative contributions like ARPANET and advancements in drone technology, is now focusing on enhancing mathematical research through AI [2][3] - The agency's funding is seen as crucial during a time when research funding is being cut, as noted by mathematician Andrew Granville [3] - The historical context of DARPA's involvement in technological advancements underscores its potential impact on the future of mathematics [2][3] Group 3: The Future of AI in Mathematics - Experts like Jordan Ellenberg suggest that understanding AI's capabilities in generating mathematical insights will be crucial for future developments [6] - The article raises concerns about the lack of understanding of how AI operates, which is unprecedented in technological history [6] - There is a call for better communication and tools to facilitate the integration of AI in mathematical proofs, as current methods can be cumbersome [5][6]
陶哲轩:感谢Lean,我又重写了20年前经典教材!
机器之心· 2025-06-01 03:30
Core Viewpoint - Terence Tao has announced the creation of a Lean companion project for his undergraduate textbook "Analysis I," aiming to provide an alternative learning method through formalized mathematics using the Lean proof assistant [1][2]. Group 1: Project Overview - The Lean project will convert definitions, theorems, and exercises from "Analysis I" into Lean format, allowing students to engage with the material interactively [2][4]. - The project is intended to transition towards the standard Lean library Mathlib, which is one of the largest and most active formal mathematics projects globally [1][2]. Group 2: Educational Goals - "Analysis I" focuses on foundational topics such as the construction of natural numbers, integers, rational numbers, and real numbers, providing sufficient set theory and logic knowledge for rigorous proofs [2]. - The Lean project aims to enhance the learning experience by allowing students to complete exercises directly in Lean code, although official answers will not be provided [2][4]. Group 3: Structure and Content - The textbook consists of 11 chapters, with some chapters already formalized in Lean [3]. - The project maintains a deliberate strategy of partial independence from Mathlib, initially constructing certain mathematical structures independently before transitioning to Mathlib's definitions [5]. Group 4: Community Engagement - The Lean version of the textbook is now available for users, including mathematics students and researchers interested in formal verification, to engage with the material and provide feedback [7]. - Users have expressed excitement about the project, noting its potential to bridge the gap between traditional mathematics education and programming-based rigor [9].
当AI遇上数学:大语言模型如何掀起一场形式化数学的革命? | Deep Talk
锦秋集· 2025-05-12 09:13
Core Viewpoint - The article discusses the transformative impact of large language models (LLMs) on the field of mathematics, particularly through the integration of formalized mathematics methods, which enhance the accuracy and reliability of theorem proofs [1][4]. Group 1: Challenges and Opportunities - The increasing complexity of modern mathematical theories has surpassed the capacity of traditional peer review and manual verification methods, necessitating a shift towards formalized mathematics [4][6]. - The "hallucination" problem in LLMs, where models generate plausible but incorrect content, poses significant challenges in the highly logical domain of mathematics, highlighting the need for rigorous verification methods [6][7]. Group 2: Formalized Theorem Proving - Formalized theorem proving utilizes a system of axioms and logical reasoning rules to express mathematical statements in a verifiable format, allowing for high certainty in validation results [8][9]. - Successful applications of formalized methods in mathematics and software engineering demonstrate their potential to ensure consistency between implementation and specifications, overcoming the limitations of traditional methods [9]. Group 3: Recent Advances Driven by LLMs - Advanced LLMs like AlphaProof and DeepSeek-Prover V2 have shown remarkable performance in solving competitive-level mathematical problems, indicating significant progress in the field of formalized theorem proving [10]. - Research is evolving from mere proof generation to the accumulation of knowledge and the construction of theoretical frameworks, as seen in projects like LEGO-Prover [10]. Group 4: Transition to Proof Engineering Agents - The transition from static "Theorem Provers" to dynamic "Proof Engineering Agents" is essential for addressing high labor costs and low collaboration efficiency in formalized mathematics [11]. - APE-Bench has been developed to evaluate and promote the performance of language models in long-term dynamic maintenance scenarios, filling a gap in current assessment tools [12][16]. Group 5: Impact and Future Outlook - The integration of LLMs with formalized methods is expected to enhance verification efficiency in mathematics and industrial applications, leading to rapid advancements in mathematical knowledge [17]. - The long-term vision includes the emergence of "Certified AI," which combines formal verification with dynamic learning mechanisms, promising a new paradigm in knowledge production and decision-making [17].
陶哲轩油管首秀:33分钟,AI速证「人类需要写满一页纸」的证明
量子位· 2025-05-12 04:11
Core Viewpoint - The article discusses the significant achievement of mathematician Terence Tao in utilizing AI to complete a mathematical proof in just 33 minutes, showcasing the potential of AI in automating complex mathematical tasks [2][10][12]. Group 1: AI in Mathematical Proofs - Terence Tao demonstrated the use of AI to formalize a proof related to the Magma equation E1689, which traditionally required extensive manual effort [8][9]. - The process involved breaking down the proof into small logical units and using GitHub Copilot to generate code, which was then refined using Lean's canonical strategy [10][12]. - This method not only reduced the time taken for the proof but also ensured that the final output was human-readable, addressing concerns about the reliance on computer-generated proofs [12][15]. Group 2: Development of Proof Assistant - Tao announced the upgrade of his mathematical proof assistant to version 2.0, which is a lightweight tool developed in Python aimed at simplifying short and tedious proof tasks [23][24]. - The assistant has two modes: assumption mode and strategy mode, with the latter being the default, allowing users to apply various strategies to solve problems [28]. - The tool supports linear arithmetic and can guide users through the proof process, demonstrating its flexibility and potential for further development [32][38]. Group 3: Community Engagement and Future Plans - Tao expressed satisfaction with the proof assistant and welcomed suggestions for new features, indicating a collaborative approach to its development [38]. - Plans include creating tools for estimating symbolic function norms and deploying strategies for established mathematical inequalities, highlighting ongoing innovation in the field [38].