Alphaproof

Search documents
陶哲轩回应OpenAI新模型IMO夺金!GPT-5测试版也曝光了
量子位· 2025-07-20 02:49
Core Insights - OpenAI's latest model achieved a gold medal level at the 2025 International Mathematical Olympiad (IMO), solving 5 out of 6 problems and scoring 35 points out of a possible 42, surpassing this year's gold medal threshold [1][2][11][12]. Group 1: Model Performance - The model's performance was evaluated under conditions identical to human participants, with two 4.5-hour exams, without any tools or internet access, requiring natural language explanations for solutions [9][11]. - The gold medal score of 35 points aligns with the human participant results, where only 5 out of approximately 600 competitors achieved full marks this year [12]. - The evaluation process was rigorous, with each solution assessed by three former IMO medalists, ensuring consensus before final scoring [13]. Group 2: Breakthrough Significance - The achievement signifies a new level of creative thinking in problem-solving, with the model demonstrating rapid progress in reasoning time across various benchmarks, culminating in tackling the IMO's complex problems [14]. - The model's success indicates a departure from traditional reinforcement learning methods, showcasing its ability to construct intricate proofs akin to human mathematicians [14]. Group 3: Upcoming Developments - Alexander Wei from OpenAI indicated that GPT-5 is set to be released soon, although the IMO gold medal model remains an experimental research project with no immediate plans for public release [3][8]. - The discovery of the code "GPT-5-reasoning-alpha-2025-07-13" in third-party repositories suggests that GPT-5 is on the horizon [6][8]. Group 4: Community Reactions - The announcement of the model's success sparked significant discussion within the AI community, with notable mathematician Terence Tao expressing skepticism about the comparability of AI performance due to the lack of standardized testing environments [23][24]. - Tao emphasized that AI capabilities are influenced by various factors, including resources and methodologies, making it challenging to quantify performance uniformly [25][26]. Group 5: Independent Evaluations - The MathArena platform conducted independent assessments, revealing that even the best-performing models, such as Gemini 2.5 Pro, scored only 13 points (31%), far below the bronze medal threshold [34][35]. - The MathArena team expressed the need for transparency regarding OpenAI's methodology to validate the reported results [37].