数学证明
Search documents
AI如何进行几何推理?北邮专家带学生探索人工智能的本质
Xin Jing Bao· 2025-10-21 12:11
为了让同学们了解人工智能如何做几何证明,王晓茹先讲解了人类证明一个定理的过程。"一般来说, 我们首先要明确定理的基本假设,然后基于这些假设推导出一些中间结果,再基于这些结果往前推导, 直到推出要证明的结论。这一证明方式是通用的,只不过头脑聪明的同学会选择更有希望的方向去推 导。" 而计算机完全可以模仿这种技能,从最初的假设一点点得出结论。"它可能智商很低,没有任何技巧可 言,但是它的速度非常快。"王晓茹告诉同学们,基于此,人们提出了很多新方法和新思路。 "在之前的一次讲座中,有个同学问我什么时候能获图灵奖?我思考了一下,回答他'我还年轻,我会继 续努力'。现在我把这句话也送给大家。"课堂最后,王晓茹勉励同学们在人工智能的道路上不断钻研。 作为2025年弘扬科学家精神系列活动之一,本次活动由北京市科技教育中心(北京市科学技术协会党 校)在北京市科学技术协会的指导下联合北京市石景山区科学技术协会共同开展,北京市石景山区八宝 山街道办事处和北京市黄庄职业高中大力支持。未来,市科协党校将持续深化科学家精神宣传教育,引 导广大青少年走近科学家、理解科学事业,为培育具备科学素养与家国情怀的时代新人,为服务北京国 际科技创 ...
陶哲轩用GPT-5解决数学难题:仅29行Python代码
量子位· 2025-10-04 04:13
一水 发自 凹非寺 量子位 | 公众号 QbitAI AI又又又帮陶哲轩解决了一个难题! 消息来自陶本人最新发帖,他直言不讳地表示: 如果没有AI帮忙,完成同样任务就需要花费数小时 (主要是手动编写代码和调试) 。 甚至,如果没有AI,他也不会决定采用目前已经取得成功的关键策略。 事实上,如果没有AI帮忙,我几乎不可能尝试进行这种数值搜索 (可能会寻求理论渐近分析) 。 由于用的是 GPT-5 ,OpenAI研究员Sebastien Bubeck (微软前AI副总裁&杰出科学家) 也火速转发了一波,由此在社区引发热烈讨 论。 这标志着我们正在进入一个人类与机器共同探索的新时代。 所以,陶哲轩这次用AI解决了什么问题?AI又在其中起了多大作用? 咱接着康康—— 仅用29行Python代码帮助验证结果 陶哲轩这次要解决的是MathOverflow (专业数学问答社区) 上的一个问题: 除了纷纷回忆和陶神本人类似的经历,网友们无不感慨: 序列lcm(1,2,…,n)是否是高度丰数的一个子集? | ITTatTruvel Tum | | | --- | --- | | A Home | ls the least co ...
大语言模型离“数学证明高手”还有多远?斯坦福、伯克利、MIT 团队提出 IneqMath 评测标准
AI前线· 2025-07-17 04:47
Core Viewpoint - The article discusses the limitations of large language models (LLMs) in mathematical reasoning, particularly in proving inequalities, and introduces a new framework called IneqMath to evaluate their reasoning capabilities [1][4][28]. Group 1: Challenges in Mathematical Reasoning - Current LLMs often provide seemingly correct answers but lack rigorous reasoning processes, raising questions about their true understanding of logical proofs [1][18]. - Formal systems like Lean and Coq can verify proofs but are complex and not easily scalable for intricate problems [1][4]. Group 2: IneqMath Framework - Researchers from Stanford, Berkeley, and MIT propose breaking down inequality proofs into two informal tasks: Bound Estimation and Relation Prediction, creating a bridge between natural language and formal logic [4][8]. - The IneqMath dataset consists of 1,252 training problems with detailed solutions and 200 test problems annotated by International Mathematical Olympiad gold medalists [8]. Group 3: Evaluation of Reasoning - An AI mathematical judging system was developed to assess the logical soundness of each reasoning step, achieving a high F1 score of 0.93, indicating strong agreement with human evaluations [15][17]. - The judging system includes various evaluators to check for logical gaps, numerical approximations, and computation accuracy [16]. Group 4: Model Performance Insights - Despite high answer accuracy, many models fail to provide logically sound reasoning, with Grok 3 mini showing only 6% of answers having a rigorous process [18][20]. - Larger models do not necessarily improve reasoning rigor, and simply increasing the number of tokens does not lead to significant enhancements in logical clarity [20][23]. Group 5: Effective Strategies for Improvement - Two effective methods identified are self-critique, which improves accuracy by about 5%, and theorem hints, which can enhance accuracy by up to 10% for complex problems [25]. - These findings suggest that improving reasoning in models requires more than just computational power; it involves teaching models to self-reflect and utilize tools effectively [25][28].