Workflow
大语言模型
icon
Search documents
当AI遇上数学:大语言模型如何掀起一场形式化数学的革命? | Deep Talk
锦秋集· 2025-05-12 09:13
Core Viewpoint - The article discusses the transformative impact of large language models (LLMs) on the field of mathematics, particularly through the integration of formalized mathematics methods, which enhance the accuracy and reliability of theorem proofs [1][4]. Group 1: Challenges and Opportunities - The increasing complexity of modern mathematical theories has surpassed the capacity of traditional peer review and manual verification methods, necessitating a shift towards formalized mathematics [4][6]. - The "hallucination" problem in LLMs, where models generate plausible but incorrect content, poses significant challenges in the highly logical domain of mathematics, highlighting the need for rigorous verification methods [6][7]. Group 2: Formalized Theorem Proving - Formalized theorem proving utilizes a system of axioms and logical reasoning rules to express mathematical statements in a verifiable format, allowing for high certainty in validation results [8][9]. - Successful applications of formalized methods in mathematics and software engineering demonstrate their potential to ensure consistency between implementation and specifications, overcoming the limitations of traditional methods [9]. Group 3: Recent Advances Driven by LLMs - Advanced LLMs like AlphaProof and DeepSeek-Prover V2 have shown remarkable performance in solving competitive-level mathematical problems, indicating significant progress in the field of formalized theorem proving [10]. - Research is evolving from mere proof generation to the accumulation of knowledge and the construction of theoretical frameworks, as seen in projects like LEGO-Prover [10]. Group 4: Transition to Proof Engineering Agents - The transition from static "Theorem Provers" to dynamic "Proof Engineering Agents" is essential for addressing high labor costs and low collaboration efficiency in formalized mathematics [11]. - APE-Bench has been developed to evaluate and promote the performance of language models in long-term dynamic maintenance scenarios, filling a gap in current assessment tools [12][16]. Group 5: Impact and Future Outlook - The integration of LLMs with formalized methods is expected to enhance verification efficiency in mathematics and industrial applications, leading to rapid advancements in mathematical knowledge [17]. - The long-term vision includes the emergence of "Certified AI," which combines formal verification with dynamic learning mechanisms, promising a new paradigm in knowledge production and decision-making [17].
一个「always」站在大模型技术C位的传奇男子
量子位· 2025-05-10 02:39
西风 衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 怎么老是你??? (How old are you) 这是最近网友不断对着 Transformer八子之一的Noam Shazeer (为方便阅读 ,我们称 他为沙哥) 发出的灵魂疑问。 尤其是最近Meta FAIR研究员朱泽园分享了他们《Physics of Language Models》项目的系列新进展后,有网友发现,其中提到的3-token 因果卷积相关内容,沙哥等又早在三年前就有相关研究。 是的," 又 "。 因为你只要梳理一遍他的工作履历,就不难发现,AI界大大小小的突破背后,总是能发现他的名字。 "不是搞个人崇拜,但为什么总是Noam Shazeer?" △ 网友称右下角沙哥图由GPT-4o生成 朱泽园也自己也站出来表示,沙哥成果超前: 我也觉得Shazeer可能是个时间旅行者。 我原本不相信他们的gated MLP (在写第3.3部分的时候,因为门控多层感知机让训练不稳定) ,但现在我信服了 (在添加了Canon 层之后,我们在第4.1部分对比了多层感知机和门控多层感知机) 。 正式认识一下,沙哥是谁? 他是 Transformer八 ...
虞晶怡教授:大模型的潜力在空间智能,但我们对此还远没有共识
3 6 Ke· 2025-05-09 09:34
以生成式AI为代表的新技术浪潮日新月异,正带来一场深刻的技术、商业与社会变革,推动人类社会从信息社会向智能社会转变。全世界热切期待AI到 来的同时,也非常关心人工智能将带来哪些新机遇、新挑战。 为此,我们发起了一项《AI & Society 百人百问》研讨,广泛邀请AI技术大咖、AI独角兽创始人、AI投资人,以及社会学家、心理学家、国际关系专家、 科幻作家等,用多元视角,深入研讨人工智能技术引发的广泛影响,发掘AI时代的共识和非共识,共同推动人工智能始终朝着"助人发展,与人为善"的方 向可持续发展。 本期,我们非常荣幸地于4月16日邀请虞晶怡老师,为我们开启一次AI的思想远航。 精华要点: 6.感知优先的颠覆性技术路线:感知能解决的问题绝不依赖复杂认知。感知是最直接、成本最低的方案。" 7.空间智能发展的理论困境:三维表达方式千变万化,远未达成共识。如果表达不统一,采集再多数据也难奏效。 8.传感器技术的革命性突破:我认为今后的感知系统将发生巨变——能同时观测物体正反面的全新成像系统。 9.重新定义机器人设计:具身智能追求的不是精准,而是鲁棒和安全。这涉及全新的数学度量标准。 10.泡沫不可避免,OpenAI ...
拜拜,昂贵的谷歌搜索 API!阿里开源 RL 框架让大模型自给自足、成本直降88%,网友:游戏规则变了
AI前线· 2025-05-09 05:18
Core Viewpoint - Alibaba's new technology "ZeroSearch" significantly reduces the cost and complexity of training AI systems for information retrieval, eliminating the need for expensive commercial search engine APIs [1][2][14]. Summary by Sections Technology Overview - ZeroSearch is a reinforcement learning framework that allows large language models (LLMs) to develop advanced search capabilities through simulation, outperforming models based on real search engines while incurring zero API costs [2][3]. - The technology is compatible with various model series, including Qwen-2.5 and LLaMA-3.2, and does not require a separate supervised preheating phase [2][3]. Performance Metrics - In comprehensive experiments across seven question-answer datasets, ZeroSearch's performance matched or exceeded that of models trained with real search engines [3][5]. - A 3 billion parameter LLM can achieve search capabilities comparable to Google, while a 14 billion parameter module can surpass Google's performance [3][5]. Cost Efficiency - Training using Google search via SerpAPI for approximately 64,000 queries costs around $586.70, while using a 14 billion parameter simulated LLM on four A100 GPUs costs only $70.80, representing an 88% reduction in costs [7][8]. Methodology - ZeroSearch begins with a lightweight supervised fine-tuning process that transforms LLMs into retrieval modules capable of generating relevant and irrelevant documents in response to queries [9][11]. - The system employs a course-based learning deployment mechanism, gradually increasing the difficulty of generated documents to simulate challenging retrieval scenarios [11][12]. Implications for AI Development - ZeroSearch represents a significant shift in AI training methods, enabling AI systems to improve without relying on external tools like search engines [14][15]. - This technology creates a more equitable competitive environment for small AI companies and startups by drastically lowering the entry barrier associated with high API costs [14][15].
英特尔深入零售门店打造“智慧大脑”,重点发力海外
Feng Huang Wang· 2025-05-09 02:45
Core Insights - Intel is leveraging AI and computing power to transform retail experiences, enabling features like facial recognition for personalized recommendations and quick checkout processes [1] - At the 25th China Retail Industry Expo, Intel showcased smart retail solutions in collaboration with partners, emphasizing the role of AI technologies in retail transformation [1] Group 1: Smart Retail Solutions - Intel's smart retail architecture combines edge computing and endpoint devices, utilizing its Core Ultra processors and Xe graphics for various retail functionalities [1] - The endpoint devices powered by Intel's Core Ultra processors support functions such as smart shopping assistance, stock alerts, product recommendations, and advertising, aimed at reducing operational costs [1] - Edge devices, supported by Core Ultra processors and multiple Xe graphics cards, facilitate store management tasks like compliance checks and customer flow analysis [1] Group 2: AI POS Solutions - Intel's AI POS solutions are built on different levels of computing platforms, optimized with Intel's oneAPI and OpenVINO toolkits for flexible algorithm models [2] - The company aims to break the price war cycle with its initiatives and plans to launch another Edge AI project this year to promote retail devices in overseas markets [2]
挑战AI数学推理极限!大规模形式化数学基准FormalMATH发布,最强模型成功率仅16%
量子位· 2025-05-07 09:33
Core Insights - The FormalMATH benchmark test, developed by institutions such as The Chinese University of Hong Kong and Zhejiang University, consists of 5,560 rigorously validated mathematical problems, covering various fields from Olympiad level to undergraduate courses, and is 22.8 times larger than existing benchmarks [1][5][4]. Group 1: Performance of LLMs - The performance of current LLM-driven theorem provers is significantly below expectations, with the best model, Kimina-Prover, achieving a success rate of only 16.46% under resource constraints [3][15]. - Most models perform close to random guessing in calculus and other areas, indicating a substantial capability gap [3][7]. - There is a notable domain bias, with better performance in algebra compared to weaker results in calculus [11][12]. Group 2: Error Analysis - Common error patterns include: - Redundant assumptions (34%): Introducing irrelevant premises [16]. - Incomplete proofs (62%): Missing critical steps in the proof [16]. - Misuse of automation strategies (65%): Incorrectly applying automated tools [16]. - Inability to handle inequalities correctly (13%): Over-reliance on automated inequality calculation strategies [16]. - The analysis shows that LLM provers often resort to shortcut tactics, which leads to significant errors [14]. Group 3: Future Directions - To enhance the formal reasoning capabilities of LLMs, three areas of focus are proposed: - Strengthening multi-step planning to reduce reliance on single-step tactics [19]. - Cross-domain generalization through curriculum learning to balance training data across different mathematical fields [19]. - Development of interactive proof-assistance tools for collaboration between LLMs and human experts [19]. Group 4: Open Source Initiative - The research team has made the FormalMATH benchmark's code, training data, and evaluation models publicly available, encouraging collaboration between academia and industry to advance formal mathematical reasoning technologies [20][21].
AI赋能保险业变革:从经验到数据智能驱动的跨越
Huan Qiu Wang· 2025-05-06 08:17
【环球网保险综合报道】在人工智能技术广泛普及的时代浪潮下,各行各业纷纷探索与 AI 深度融合的商业模式 创新之路。而保险行业作为金融领域的重要组成部分,自不例外地踏上这场变革中。 进一步来讲,《白皮书》指出,对于保险行业,大语言模型已不再局限于技术层面的升级迭代,而正在引发一 场"经验驱动"向"数据智能驱动"的深层次认知转型。 以众安信科为例,公司CEO郁锋在2025保险科技峰会上表示,众安信科基于对大模型底层技术的持续研究,在 DeepSeek技术突破后,捕捉到AI在企业内落地应用的可能性。为此,众安信科聚焦众安内部10个关键业务场景, 利用沉淀的6亿多用户海量数据,打造出完全匹配保险行业的智能中台,并创造200多个保险垂类智能体。目前, 众安内部每月AI中台调用量已超5000万次。 "拥抱AI,用all in AI的心态做AI in all的改造是AI+保险核心。"日前,众安保险常务副总经理、众安信科董事长王 敏在2025保险科技峰会上指出。而此次峰会的主题为"从互联网时代到AI时代,AI+保险的战略推进与应用创 新"。 在王敏看来,相较于互联网保险时代的通用产品与爆款,AI的影响力将渗透至保险行业的每一 ...
当答案变得廉价时,好问题就是新的稀缺品
3 6 Ke· 2025-05-04 00:03
Group 1 - The core argument of the article is that in an era where answers are easily accessible, the value lies in asking the right questions, which can reshape understanding and drive creativity [1][4][19] - The invention of photography in the 1830s challenged traditional artistic standards, leading artists to focus on subjective experiences rather than mere replication of reality [3][10][11] - The emergence of large language models (LLMs) has made obtaining answers cheaper, but this has led to a decline in the quality of inquiry and an increase in the cost of asking good questions [15][17][26] Group 2 - The article emphasizes that the value of information is proportional to the uncertainty it eliminates, as illustrated by Claude Shannon's information theory [21][22][23] - It argues that in a world of information overload, the challenge is not the lack of facts but the misalignment of attention, leading to a focus on quantity over quality in answers [31][32][46] - The piece highlights the importance of redefining problems and frameworks to navigate structural uncertainties effectively, suggesting that good questions can expand the boundaries of understanding [37][38][39]
315 行代码构建编程助手,Go大佬揭开智能体的「神秘面纱」
机器之心· 2025-05-03 04:18
选自 ampcode.com 作者:Thorsten Ball 机器之心编译 首先准备好我们的「文具」: 铅笔出场!让我们直接开始,用四个简单的命令来设置一个新的 Go 项目: 知名 Go 大佬 Thorsten Ball 最近用 315 行代码构建了一个编程智能体,并表示「它运行得非常好」且「没有护城河」(指它并非难以复制)。 Thorsten Ball 在编程领域以其对系统编程和编程语言的深入研究而闻名,尤其擅长解释器、编译器和虚拟机等主题。他撰写的《用 Go 语言自制编译器》和《用 Go 语言自制解释器》则被视为编译原理领域的「入门平替」。 虽然这个编程智能体无法和 Claude、Gemini 等推出的编码功能相媲美,却为初学者提供了一个探索智能体的良好学习范例。这反映了他一贯的理念:通过实践和 开源项目揭开技术的「神秘面纱」。 Thorsten Ball 在博客中分享了他的具体操作步骤。(注:本文中的代码截图可能并不完整,详细内容请参阅原博客。) 博客地址:https://ampcode.com/how-to-build-an-agent 乍看之下,智能体编辑文件、运行命令、自行解决错误似乎很复杂,但 ...
唐兴资本:睿见果敢,洞察投资项目潜藏的巨大价值
Sou Hu Cai Jing· 2025-05-02 02:58
Group 1 - The emergence of DeepSeek, a large model comparable to ChatGPT, has created significant waves in the global technology and capital markets, igniting enthusiasm for innovation and investment opportunities in the tech sector [3] - Tangxing Capital focuses on discovering and nurturing high-growth potential hard tech companies, aiming to drive industrial upgrades and regional economic development through a comprehensive support system [3][4] - The investment team at Tangxing Capital possesses deep industry backgrounds and professional investment capabilities, allowing them to accurately grasp technology development trends and identify quality projects [3][4] Group 2 - Young entrepreneurs like Liang Wenfeng and Wang Xingxing exemplify the characteristics of contemporary tech leaders, showcasing strong learning abilities and rapid application of new technologies [4][5] - These entrepreneurs break traditional thinking and industry boundaries, integrating resources across sectors to create new application scenarios and business models [5][6] - Key traits admired in successful entrepreneurs include innovation spirit, cross-disciplinary integration ability, strategic vision, and focus on core business areas [6] Group 3 - The investment style of Tangxing Capital is characterized by "insightful decisiveness," emphasizing the ability to quickly identify and act on investment opportunities [7] - A notable investment decision involved a significant investment in Plater, a key player in the 3D printing industry, despite market uncertainties, which later yielded a tenfold return [9] - Plater's technology addresses complex manufacturing needs in aerospace, automotive, and medical sectors, significantly contributing to China's manufacturing transformation [8][9] Group 4 - The current bull market is driven by a combination of macroeconomic stability, loose monetary policy, and positive market sentiment, creating a conducive environment for investment [10][11] - The bull market enhances the financing environment for primary markets, encouraging entrepreneurship and accelerating company growth through increased funding [12][13] - The interaction between primary and secondary markets fosters a cycle of investment and exit opportunities, optimizing resource allocation and enhancing economic vitality [14]