Workflow
CICERO
icon
Search documents
先别急着给OpenAI加冕!陶哲轩:这种「金牌」,含金量取决于「赛制」
机器之心· 2025-07-20 03:11
机器之心报道 机器之心编辑部 昨天,OpenAI 官宣了一个重磅消息:他们的一个推理模型在国际数学奥林匹克(IMO)竞赛中获得了金牌水平的表现。 官宣该消息的 OpenAI 研究科学家 Alexander Wei 表示,在评估过程中,研究团队严格按照人类选手的比赛规则进行测试:模型需要在两个 4.5 小时的考试环节 中,在没有任何工具或网络辅助的情况下,阅读官方题目并撰写自然语言证明。 在评估中,该模型成功解决了 2025 年 IMO 六道题目中的五道,获得了 35 分(满分 42 分)的成绩,足以获得金牌。每道题目都由三位前 IMO 奖牌获得者独立评 分,并在达成一致后确定最终分数。 在该消息公布后,整个 AI 社区都为之振奋。Alexander Wei 还晒出了 OpenAI 新模型生成的证明过程。 | പ്പ aw31/openai-imo-2025-proofs (Public | | | | | | | --- | --- | --- | --- | --- | --- | | く〉 Code O Issues 2 | { } Pull requests | O Actions Projects | ...
OpenAI 研究员 Noam Brown:Mid-training 是新的 pre-training
海外独角兽· 2025-07-02 11:03
Core Insights - The article discusses the emergence of reasoning capabilities in AI models, highlighting a shift from mere pattern matching to complex cognitive reasoning, which is essential for scientific discovery and decision-making [4][5]. Group 1: Reasoning as an Emergent Capability - Reasoning is an emergent ability that models can only benefit from once pre-training reaches a certain level [5][11]. - The analogy of "fast thinking and slow thinking" is used to explain the relationship between non-reasoning and reasoning models, where the former corresponds to intuitive responses and the latter to deliberate reasoning [8][11]. - The performance of models in multi-modal tasks depends on their ability to integrate complex information and logical reasoning [12][13]. Group 2: Need for a Universal Reasoning Paradigm - Achieving superintelligence requires a universal reasoning paradigm, as merely scaling pre-training is insufficient [20][21]. - OpenAI's leadership recognized the need for a shift towards reasoning paradigms and reinforcement learning, leading to significant resource allocation in these areas [21][24]. Group 3: Efficient Data Utilization through Reinforcement Learning - Reinforcement learning can enhance the efficiency of data usage, which is crucial as data becomes scarcer than computational power [25]. - Current machine learning models require significantly more samples than humans to learn new concepts, highlighting the need for improved sample efficiency [25][26]. Group 4: Non-Consensus Views on Reasoning Ability - Reasoning is not limited to tasks with clear reward functions; it can also excel in subjective fields where results are harder to quantify [33]. - The alignment of AI with user preferences is critical, and reasoning capabilities can help achieve this alignment while mitigating ethical risks [34][35]. Group 5: Bottlenecks in Test-Time Compute Development - Test-time compute faces cost limitations similar to those encountered during pre-training scaling, where increased model size leads to exponentially rising costs [36]. - The absolute time constraints on model responses hinder the speed of experimental iterations, impacting research efficiency [37][38]. Group 6: Mid-Training as a New Pre-Training Phase - Mid-training is introduced as a phase that adds new capabilities to models before the completion of pre-training, enhancing their generalization and practicality [40][41]. - OpenAI has adopted mid-training strategies in its model training processes to improve alignment and safety [41][42]. Group 7: Insights from The Bitter Lesson for Multi-Agent Systems - The concept of multi-agent systems may lead to the emergence of an "AI civilization" through long-term collaboration and competition among AI agents [44]. - Noam's team is exploring a principled research path that contrasts with traditional heuristic-based approaches in multi-agent research [45][46].