Workflow
AI七个月突破数学家“围剿”反超人类!14位数学家深挖原始推理token:不靠死记硬背靠直觉
量子位·2025-06-09 07:29

Core Insights - The article discusses the impressive performance of the model o3-mini-high on the FrontierMath benchmark, achieving a score of 22% after initially only answering 2% of the questions correctly, within a span of 7 months [1][37]. Group 1: Model Performance - o3-mini-high demonstrated strong knowledge retention and reasoning capabilities, relying more on intuition than precise proofs [3][4]. - The model successfully expanded on complex mathematical concepts and did not face significant barriers in understanding general knowledge related to the problems [8][10]. - In 29 reasoning records analyzed, o3-mini-high reached correct conclusions 13 times, indicating a notable level of success [5]. Group 2: Model Limitations - Despite its strengths, o3-mini-high lacks creativity and depth of understanding, often resembling a well-read graduate student who can recite information without deep comprehension [29][30]. - The model tends to skip formal proofs and directly guesses answers, which some mathematicians view as a form of "cheating" [15][16]. - Approximately 75% of the reasoning records contained inaccuracies, with the model frequently misremembering mathematical terms and formulas [35]. Group 3: Future Implications - The ongoing evolution of the FrontierMath project raises questions about the potential for AI to tackle even more challenging mathematical problems, possibly surpassing human mathematicians [43]. - The performance of o3-mini-high has led mathematicians to consider the implications of AI on the future role of mathematicians, especially if AI reaches a level capable of solving unsolved problems [43].