AlphaProof

Search documents
AI跨步进入“经验时代”
Hua Er Jie Jian Wen· 2025-09-11 03:50
作者 | 柴旭晨 编辑 | 周智宇 迅猛迭代的AI似乎正迎来一次关键的转折。 9月11日,在2025 Inclusion·外滩大会,"强化学习之父"理查德·萨顿(Richard Sutton)指出,AI行业处在"人类数据时代",今天大多数机器学习的目的,是把 人类已有的知识转移到一个静态、没有自主学习能力的 AI 上。 问题在于,这一路线下,人类数据红利正逼近极限,而持续学习对智能的效用至关重要。他认为,AI正在进入以持续学习为核心的"经验时代",潜力也将远 超以往。 放眼宇宙的历史,萨顿将其分为四个时代:粒子时代、恒星时代、复制者时代和设计时代。他认为人类的独特之处在于"把设计推向极致",创造出能自己设 计的事物,这也正是今天通过AI所追求的目标。人类至少是催化剂,更是开启宇宙第四大时代——"设计时代"的先驱。 "AI是宇宙演化的必然下一步,我们应以勇气、自豪和冒险精神来迎接它。"萨顿表示。 风险提示及免责条款 萨顿在发言中表示,我们正进入"经验时代",需要一种由智能体与世界直接交互中生成的新数据源。这正是人类和其他动物的学习方式,也是近期 AlphaProof在国际数学奥林匹克斩获银牌的路径。 萨顿解释, ...
人工智能为数学家找到“巨人的肩膀”
Ke Ji Ri Bao· 2025-08-25 01:32
Core Insights - The integration of AI and mathematics is significantly enhancing research efficiency and enabling breakthroughs in mathematical theories [1][3][6] - AI's ability to verify mathematical results and assist in theorem proving is a key advantage, allowing researchers to focus on original contributions rather than rediscovering established results [3][4][9] - The development of AI tools and models is fostering a new era of mathematical research, with notable collaborations yielding new mathematical theorems [6][7][8] Group 1: AI's Impact on Research Efficiency - AI greatly improves the efficiency of mathematical research by validating results and expanding researchers' thinking [3] - AI can assist in precise semantic searches, helping researchers identify previously established theories and avoid redundant work [4][5] - The ability of AI to bridge different theories and tools enhances researchers' understanding and inspires deeper exploration [5] Group 2: Representative Achievements - Significant achievements in the field include collaborations between AI teams and mathematicians, leading to the formulation of new mathematical theorems [6][7] - AI's capability to analyze data and suggest function forms accelerates the research process by revealing hidden relationships between variables [7] Group 3: Challenges and Future Directions - Despite progress, challenges remain, particularly in the verification of mathematical expressions and the need for a formalized language to eliminate ambiguities [9][10] - The establishment of high-quality mathematical datasets is crucial for training AI models effectively, necessitating collaboration among mathematicians [10] - The push for digital transformation in mathematics aims to create a simulator for mathematical reasoning, enhancing AI's practical application in research [9]
AI拿下奥数IMO金牌,但数学界的AlphaGo时刻还没来
3 6 Ke· 2025-08-01 02:40
Group 1 - The core event of the 2025 International Mathematical Olympiad (IMO) was marked by AI achieving gold medal standards, with OpenAI and DeepMind both announcing scores of 35 out of 42, indicating a significant leap in AI's mathematical reasoning capabilities [1][4][8] - The competition between OpenAI and DeepMind intensified, highlighted by DeepMind's criticism of OpenAI for prematurely announcing results, and the subsequent poaching of key DeepMind researchers by Meta [3][9][12] - The IMO gold medal results, while impressive, do not yet signify that AI has surpassed human capabilities in mathematics, as 72 high school students also achieved gold standards, with five scoring perfect 42s [12][30] Group 2 - The achievement of AI in the IMO serves as a benchmark for evaluating AI's reasoning abilities, with previous models like AlphaGeometry and AlphaProof only reaching silver standards [13][16] - DeepMind's Gemini Deep Think model demonstrated a significant advancement by solving problems using natural language without relying on formal proof systems, challenging previous assumptions about AI's reasoning capabilities [18][20] - The differing approaches of OpenAI and DeepMind in solving problems were noted, with OpenAI using more computational methods while DeepMind's approach was more aligned with human problem-solving techniques [22][23] Group 3 - The implications of AI's performance in the IMO are debated within the academic community, with some experts believing AI can assist mathematicians by generating insightful prompts and ideas [34][40] - Conversely, skepticism exists regarding AI's role in mathematics, with concerns that it may reduce the discipline to a mere technical product, undermining the creative and exploratory nature of mathematical research [36][39] - The ongoing discourse highlights a divide in the mathematical community about the potential benefits and drawbacks of AI in research, emphasizing the need for deeper discussions on the purpose and implications of AI in mathematics [36][40]
美版“梁文锋”不信邪
虎嗅APP· 2025-07-31 09:50
Core Viewpoint - The article discusses the emergence of Harmonic, a startup focused on developing a zero-hallucination AI model named Aristotle, which aims to solve the challenges of AI in mathematical reasoning and formal verification [4][5][6]. Group 1: Company Overview - Harmonic is a startup founded by Vlad Tenev and Tudor Achim, focusing on creating AI that can perform mathematical reasoning without hallucinations [9][10]. - The company has rapidly gained attention and investment, achieving a valuation close to $900 million within two years of its establishment [25][26]. - Harmonic's product, Aristotle, is designed to provide rigorous mathematical proofs and reasoning, addressing the common issue of hallucinations in AI outputs [20][21]. Group 2: Technology and Innovation - Aristotle utilizes a formal verification tool called Lean, which ensures that every step in the reasoning process is validated, thus eliminating the possibility of generating false information [36][38]. - The model has demonstrated impressive performance in mathematical competitions, achieving a success rate of 90% in the MiniF2F test, significantly outperforming existing models like OpenAI's GPT-4 [41][42]. - Harmonic's approach emphasizes the importance of rigorous logical constraints in AI, aiming to make AI a reliable assistant in high-stakes fields such as finance and healthcare [21][19]. Group 3: Market Position and Competition - The AI industry is increasingly recognizing the need for more rigorous reasoning capabilities, creating opportunities for companies like Harmonic [27][28]. - Harmonic faces competition from established players like DeepMind and OpenAI, which have their own advanced models and extensive data resources [50][51]. - The startup's unique selling proposition lies in its focus on zero-hallucination outputs, which is a critical requirement in precision-demanding applications [17][19].
Nature头条:AI大模型已达国际数学奥赛金牌水平
生物世界· 2025-07-25 07:54
Core Viewpoint - The article highlights a significant achievement in artificial intelligence (AI), where large language models (LLMs) have reached gold medal level in the International Mathematical Olympiad (IMO), showcasing their advanced problem-solving capabilities [4][5][6]. Group 1: AI Achievement - Google DeepMind's large language model successfully solved problems equivalent to those in the IMO, achieving a score that surpasses the gold medal threshold of 35 out of 42 [4][5]. - This marks a substantial leap from the previous year's performance, where the model was only at the silver medal level, indicating a qualitative breakthrough in AI's ability to handle complex mathematical reasoning [5][6]. Group 2: Implications of the Achievement - The success of LLMs in the IMO demonstrates their capability to tackle highly complex tasks that require deep logical thinking and abstract reasoning, beyond mere text generation [7]. - Such AI advancements can serve as powerful tools in education and research, assisting students in learning higher mathematics and aiding researchers in exploring new conjectures and theorems [7]. - Achieving gold medal level in mathematics is a significant milestone on the path to artificial general intelligence (AGI), as it requires a combination of various cognitive abilities [7][8]. Group 3: Broader Impact - The breakthroughs by DeepMind and OpenAI not only elevate AI's status in mathematical reasoning but also suggest vast potential for future applications in scientific exploration and technological development [8].
全球首个IMO金牌AI诞生!谷歌Gemini碾碎奥数神话,拿下35分震惊裁判
首席商业评论· 2025-07-23 04:02
Core Viewpoint - Google DeepMind has officially announced its achievement of winning a gold medal at the International Mathematical Olympiad (IMO) with its Gemini Deep Think model, scoring 35 out of a possible 42 points, thus meeting the gold medal standard within 4.5 hours [1][3][4][22]. Group 1: Achievement Details - Gemini Deep Think is a general model that successfully solved the first five problems of the IMO, earning a score of 35 [3][22]. - The model completed the tasks using pure natural language (English), which is a significant advancement compared to previous AI models [5][25]. - This achievement is officially recognized by the IMO organizing committee, marking it as the first AI system to receive such an acknowledgment [6][7]. Group 2: Competition Context - The IMO, held annually since 1959, is a prestigious competition that attracts top students globally, with only the top 8% of participants earning gold medals [10][12]. - The competition requires participants to solve six complex mathematical problems within a 4.5-hour timeframe, testing not only logical reasoning but also creative thinking and rigor [11][15]. Group 3: Technical Innovations - Gemini Deep Think utilized an advanced reasoning mode that allows for parallel thinking, enabling the model to explore multiple problem-solving paths simultaneously [29][30]. - The model was trained using novel reinforcement learning techniques, enhancing its capabilities in multi-step reasoning and theorem proving [33][94]. - The combination of training, knowledge base, and strategic approaches contributed to Gemini's outstanding performance at the IMO [33]. Group 4: Future Implications - Google DeepMind aims to further develop AI that can tackle more complex mathematical problems, believing that AI will become an indispensable tool for mathematicians, scientists, engineers, and researchers [76][78]. - The success of Gemini Deep Think at the IMO highlights the potential for AI to contribute significantly to the field of mathematics [76][78].
“深层思维”宣布人工智能测试得分达国际数学奥赛金牌水平
Xin Hua She· 2025-07-22 07:30
Group 1 - The core achievement of Google's DeepMind is the advanced version of the Gemini AI model, which scored 35 points in the International Mathematical Olympiad (IMO), reaching gold medal level [1] - The Gemini model successfully solved 5 out of 6 problems from the 2025 IMO, with the official score confirming its performance [1] - The IMO has been a platform for testing AI models' capabilities in solving advanced mathematical problems since its inception in 1959 [1] Group 2 - DeepMind's AI models, AlphaProof and AlphaGeometry 2, solved 4 out of 6 problems in the 2024 IMO, achieving a score of 28 points, which corresponds to silver medal level [2] - The advanced Gemini model shows significant progress compared to the previous year, as it can directly provide mathematical proofs based on natural language descriptions [2] - The success of the Gemini model is attributed to its "deep reasoning" mode, which employs enhanced reasoning techniques to explore multiple potential solutions simultaneously [2]
DeepMind夺得IMO官方「唯一」金牌,却成为OpenAI大型社死现场
机器之心· 2025-07-22 04:25
Core Viewpoint - Google DeepMind's Gemini model has achieved a historic milestone by winning a gold medal at the International Mathematical Olympiad (IMO), solving five out of six complex problems and scoring 35 out of 42 points, marking it as the first AI system officially recognized as a gold medalist by the IMO committee [2][4]. Group 1: Achievement and Methodology - The Gemini Deep Think system utilizes enhanced reasoning capabilities through what researchers describe as parallel thinking, allowing it to explore multiple potential solutions simultaneously, unlike traditional AI models that follow a single reasoning chain [6]. - The model operates end-to-end using natural language, generating rigorous mathematical proofs directly from the official problem descriptions, and completed the tasks within the competition's 4.5-hour time limit [7]. Group 2: Comparison with OpenAI - Google DeepMind's cautious announcement approach has garnered widespread praise in the AI community, contrasting sharply with OpenAI's handling of similar achievements, which faced criticism for premature announcements [11][12]. - OpenAI's decision to announce its results without participating in the official IMO evaluation process has led to skepticism regarding the credibility of its claims, as it relied on a group of former IMO participants for scoring [15]. Group 3: Industry Implications - The competition highlights not only a technological contest but also a demonstration of norms, timing, and collaborative spirit within the AI community. DeepMind's respect for official recognition and careful release of results has earned it both a gold medal and respect, while OpenAI's timing and method have sparked controversy [25].
全球首个IMO金牌AI诞生!谷歌Gemini碾碎奥数神话,拿下35分震惊裁判
猿大侠· 2025-07-22 03:33
Core Viewpoint - Google DeepMind has officially announced that its model, Gemini Deep Think, has won a gold medal at the International Mathematical Olympiad (IMO) by solving five problems in 4.5 hours, achieving a score of 35 out of 42, which is a significant milestone for AI in mathematics [3][4][22]. Group 1: Achievement and Recognition - Gemini Deep Think is the first AI system to receive official gold medal recognition from the IMO committee [6][7]. - The IMO, held annually since 1959, is a prestigious competition that tests the mathematical abilities of students worldwide [11][12]. - The competition requires participants to solve six complex mathematical problems within a limited time, with only the top 8% receiving gold medals [13][16]. Group 2: Technical Aspects of Gemini Deep Think - Unlike previous models, Gemini Deep Think operates entirely in natural language, allowing it to generate rigorous mathematical proofs directly from problem descriptions [29][32]. - The model employs advanced reasoning techniques, including parallel thinking, enabling it to explore multiple solution paths simultaneously [33][38]. - The training of Gemini involved a combination of reinforcement learning and access to a curated database of high-quality mathematical solutions [37][126]. Group 3: Problem-Solving Process - The model's approach to the problems was methodical, breaking down complex proofs into clear, understandable steps [24][41]. - For example, in the first problem, the model simplified the problem to a specific case and established a lemma to prove the core condition [44][50]. - The solutions provided by Gemini were noted for their clarity and precision, earning praise from IMO judges [24][87]. Group 4: Future Implications - Google plans to make the advanced version of Gemini Deep Think available to select mathematicians and Google AI Ultra subscribers in the future [39]. - The success of Gemini Deep Think highlights the potential for AI to contribute significantly to the field of mathematics, combining natural language capabilities with rigorous reasoning [102][105].
“AI登月时刻”,OpenAI模型摘取奥数金牌
Hu Xiu· 2025-07-20 01:41
Core Insights - OpenAI's general reasoning model achieved a gold medal level performance in the recently concluded International Mathematical Olympiad (IMO), solving 5 out of 6 problems under the same conditions as human participants [1][22][21] - This achievement signifies a major breakthrough in AI capabilities, demonstrating that the model can perform complex reasoning tasks without relying on specialized systems or verified reward signals [1][6][24] Group 1: Model Performance and Achievements - OpenAI's model, o3 alpha, secured second place in the AtCoder World Tour 2025 finals, showcasing its strength in programming and physics [2] - The model's performance in the IMO, scoring 35 out of 42 points, indicates its ability to match human mathematicians in rigorous proof writing [1][22] - OpenAI's advancements have positioned it ahead of competitors like DeepMind and Anthropic, as well as open-source models led by China [3] Group 2: Research and Development - OpenAI is testing a new reasoning model, with the IMO gold medal performance being a preliminary demonstration, and a formal release is expected by the end of this year [4] - The research led by Alexander Wei emphasizes the model's ability to engage in sustained creative thinking, a significant leap from previous benchmarks [5][27] - The model's development involved general reinforcement learning techniques, allowing it to tackle complex problems without task-specific training [7][20] Group 3: Future Implications - The success in the IMO raises expectations for AI's potential to solve significant mathematical problems, with an 81% market prediction that AI could address a Millennium Prize Problem by 2030 [12][28] - OpenAI's chief research officer noted that the model's broad reasoning capabilities extend beyond competition-specific tasks, indicating a shift towards more generalized AI applications [10][24] - The rapid progress in AI, from elementary to advanced mathematical problem-solving, suggests that AI may soon play a substantial role in scientific discovery [28][29]