预训练
Search documents
Hinton加入Scaling Law论战,他不站学生Ilya
量子位· 2026-01-01 02:13
一水 发自 凹非寺 量子位 | 公众号 QbitAI 我并不认为Scaling Law已经完全结束了 。 正当学生Ilya为Scaling Law"泼下冷水"时,他的老师、AI教父Geoffrey Hinton却毅然发表了上述截然相反的观点。 这一场面一出,我们不禁回想起了两件有趣的事。 一是Ilya几乎从学生时代起就坚信Scaling Law,不仅一抓住机会就向身边人安利,而且还把这套理念带进了OpenAI。 可以说,Ilya算是Scaling Law最初的拥趸者。 二是Hinton后来在回顾和Ilya的相处时,曾大肆夸赞Ilya"具有惊人的直觉",包括在Scaling Law这件事上,Hinton曾坦言: 当时的我错了,而Ilya基本上是对的。 比如Transformer确实是一种创新想法,但实际上起作用的还是规模,数据的规模和计算的规模。 但是现在,这对师徒的态度却来了个惊天大反转。 所以,这中间到底发生了什么? Scaling Law不死派:Hinton、哈萨比斯 其中,最大的挑战无疑是数据缺失问题。 大部分高价值数据都锁在公司内部,免费互联网数据已基本耗尽。 而这个问题将由AI自行解决,即模型通过推 ...
有300亿美元也未必“再造GPT-4”?NUS尤洋最新长文:拆穿AI增长瓶颈的真相
量子位· 2025-12-31 03:37
Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].
Dwarkesh最新播客:AI 进展年终总结
3 6 Ke· 2025-12-24 23:15
Core Insights - Dwarkesh's podcast features prominent AI figures Ilya Sutskever and Andrej Karpathy, indicating his significant standing in the AI community [1] - The article summarizes Dwarkesh's views on AI advancements, particularly regarding the timeline for achieving AGI [1] Group 1: AI Development and AGI Timeline - The focus on "mid-training" using reinforcement learning is seen as evidence that AGI is still far off, as it suggests models lack strong generalization capabilities [3][16] - The idea of pre-trained skills is questioned, as human labor's value lies in the ability to flexibly acquire new skills without heavy training costs [4][24] - AI's economic diffusion lag is viewed as an excuse for insufficient capabilities, rather than a natural delay in technology adoption [27][28] Group 2: AI Capabilities and Limitations - AI models currently lack the ability to fully automate even simple tasks, indicating a significant gap in their capabilities compared to human workers [25][30] - The adjustment of standards for AI capabilities is acknowledged as reasonable, reflecting a deeper understanding of intelligence and labor complexity [31] - The scaling laws observed in pre-training do not necessarily apply to reinforcement learning, with some studies suggesting a need for a million-fold increase in computational power to achieve similar advancements [10][33] Group 3: Future of AI and Continuous Learning - Continuous learning is anticipated to be a major driver of model capability enhancement post-AGI, with expectations for preliminary features to emerge within a year [13][40] - Achieving human-level continuous learning may take an additional 5 to 10 years, indicating that breakthroughs will not lead to immediate dominance in the field [14][41] - The potential for an explosion in intelligence once models reach human-level capabilities is highlighted, emphasizing the importance of ongoing learning and adaptation [36] Group 4: Economic Implications and Workforce Integration - The integration of AI labor into enterprises is expected to be easier than hiring human workers, as AI can be replicated without the complexities of human recruitment [29] - The current revenue gap between AI models and human knowledge workers underscores the distance AI still has to cover in terms of capability [30] - The article suggests that if AI models truly reached AGI levels, their economic impact would be profound, with businesses willing to invest significantly in AI labor [29]
深度|OpenAI最高职级华人Mark Chen独家回应与Gemini竞争、Meta人才战及AI核心策略
Z Potentials· 2025-12-20 04:03
Z Highlights Ashlee Vance , Core Memory 播客主持人,科技领域资深记者; Mark Chen——OpenAI 首席研究官,深耕 AGI 研究与 AI 对齐领域,主导多项核心模型研 发, AI 行业人才争夺白热化、 Gemini 3 发布后,围绕 OpenAI 研究布局与 AGI 未来展开对话。访谈时间: 2025 年 12 月 2 日。 人才攻防战: Meta 的激进招募与 OpenAI 的底气 Ashlee Vance : Alex Wayne 以前是搞数学的,对吧?你们应该认识他。 Mark Chen : 我和他见过几次,但不算太熟。 Ashlee Vance : 他为什么会离开呢? Ash lee Va nce : 人才争夺战备受关注, Meta 的动作相当激进。这场拉锯战具体是什么样的?我们现在处于什么阶段? Mark Chen : 确实存在一批核心人才,业内几乎都清楚他们是谁。很多公司都意识到,打造顶尖 AI 实验室的关键要素之一就是招揽最优秀的人才。 Meta 大力推行这一策略并不意外。我们并未坐视不管,我想从 OpenAI 的角度来讲讲这段经历。媒体上有很多 ...
RL是「点金石」还是「挖掘机」?CMU 用可控实验给出答案
机器之心· 2025-12-15 01:44
机器之心报道 机器之心编辑部 近期,强化学习(RL)技术在提升语言模型的推理能力方面取得了显著成效。 然而, 后训练究竟是真正扩展了模型的推理能力,还是仅仅挖掘了预训练中已有的潜力? 目前尚不明确。 一个核心挑战在于现代训练流程缺乏可控性:大规模预训练语料库不够透明,中期训练往往缺乏充分研究,且 RL 目标函数与未知的先验知识之间存在复杂 的交互作用。 为了回答这个问题,来自卡耐基梅隆大学(CMU)的研究者通过构建 基于 GSM-Infinite 的可控合成数据框架 ,在完全解耦的环境下,定量分析了预训 练、Mid-training(中期训练/CPT)和 RL 三者对模型推理泛化能力的因果影响。旨在剥离并独立分析预训练、中期训练以及基于 RL 的后训练各自的因 果贡献。 https://x.com/xiangyue96/status/1998488030836044112 研究者从两个维度对模型进行评估:针对更复杂组合的外推泛化能力,以及跨越不同表层语境的情境泛化能力。利用该框架,研究者调和了关于 RL 有效性 的不同观点。 研究表明: 仅当预训练留有足够提升空间,且 RL 数据针对模型的能力边界(即那些虽具 ...
GPT-5.2提前泄露?今夜,OpenAI要拿Gemini 3祭天
3 6 Ke· 2025-12-11 08:17
【导读】刚刚,GPT-5.2突袭上线Cursor,专狙Gemini 3!眼看OpenAI和谷歌的大战一触即发,网友狂呼:今晚提前过圣诞! 就在今夜,OpenAI或将打响复仇之战。 全体网友枕戈待旦,GPT-5.2随时上线! 目前,已有火眼金睛的网友发现了GPT-5.2的蛛丝马迹。 开发者社区流传的截图显示,Cursor的模型下拉菜单中,赫然出现了gpt-5.2和gpt-5.2-thinking的选项。 GPT-5.2的首战场居然选在了Cursor IDE,而非ChatGPT网页端。 这也意味着,或许OpenAI已经明白:编程不仅是AI的杀手级应用,也是最能体现模型推理能力的领域。 总之,可以预感到,谷歌和OpenAI之间的一场火花四溅的大战,马上就要打响。 网友激动狂呼:今天的圣诞节,要提前来了! 根据泄露的「大蒜(Project Garlic)」文件及Cursor社区的反馈,GPT-5.2是一款经过彻底重构的专用模型。 是的,GPT-5.2这一承载着OpenAI生死存亡使命的模型,绝非GPT-5的简单微调版。 根据OpenAI首席研究官MarkChen的说法,GPT-5.2在编程和逻辑推理任务上的表现,已经 ...
AI大家说 | 重磅嘉宾齐聚,近期Dwarkesh Podcast都聊了些什么?
红杉汇· 2025-12-11 00:04
要点速览: 2025年,一个叫Dwarkesh Podcast的播客成为了AI行业内获取一手信息最重要的渠道之一。甚至可以说,它已经 成了硅谷AI技术圈的必看节目。从Satya Nadella到Andrej Karpathy,再到Ilya Sutskever,这些平时很难约到的行 业核心人物,都选择在这里进行长时间的深度对话。本期,我们将为大家分享其中最新、最受关注的几场播客 及核心观点。 Ilya Sutskever 前OpenAI首席科学家、计算机科学家、SSI创始人 ■ 洞察1:那种无脑堆算力的「暴力美学」时代,其实已经翻篇了。 过去这五年,大家都在喊Scaling Law,好像只要GPU够多、数据够大,把整个互联网喂进去,AGI就自动产 出来了。但Ilya直接泼了盆冷水。 他说Pre-training (预训练) 已经开始式微,现在数据快用光了,到了后 面这一步 (RL和后训练) ,光靠"大"没用了。 现在又回到了2012年之前那种需要"拼品味、拼直觉"的手搓 时代 (Age of Research) 。 ■ 洞察2:「情绪」不是人类的累赘,而是进化给人类的礼物。 我们通常觉得AI是理性的,人类是感性 ...
OpenAI首席研究员Mark Chen长访谈:小扎亲手端汤来公司挖人,气得我们端着汤去了Meta
3 6 Ke· 2025-12-04 02:58
救大命,OpenAI首席研究官Mark Chen最新访谈,信息量有点大呀。 不管是OpenAI的,还是自己个儿的,又或者是同事的,主打一个"我都能聊聊"。 比如: 网友纷纷表示,这次访谈确实让人耳目一新,还有不少人在转发Mark Chen的观点。 爆料Meta抢人大战私下已经升级成送汤大战了,真能喝的那种汤,小扎熬了亲自送到OpenAI研究员嘴边。OpenAI反击也送汤。 Mark Chen、Scott Gray(OpenAI专门负责GPU内核优化的神秘狠人)等经常三五围坐,打扑克牌。其本质被解释为是概率与期望值的博弈。 OpenAI核心研究团队规模大概500人,公司内大概有300个项目。 Mark Chen表示OpenAI本质上仍然是一家纯AI研究公司。 Gemini 3发布后每个人都会用自己的方式去试探新模型,有个"42问题"从没见过哪个语言模型能真正把它完全做出来。 OpenAI"宫斗",Mark Chen如何让研究员们统一意见、促成那封让Sam回归的请愿信也被聊了出来。 透露过去半年,一直专注在预训练上,在预训练方面,有信心轻松与Gemini 3正面对决。 表示内部已经有性能达到Gemini 3的模型 ...
聊DeepSeek、聊AI硬件、聊竞争对手,OpenAI首席研究官专访信息密度有点大
3 6 Ke· 2025-12-03 07:46
Core Insights - OpenAI's Chief Research Officer Mark Chen discussed the company's strategic vision amid intense AI competition and technological advancements, addressing concerns about talent retention and the pursuit of AGI [1] Group 1: Talent Acquisition and Retention - OpenAI faces aggressive talent poaching from competitors like Meta, which reportedly invests billions annually in recruitment efforts, yet most OpenAI employees have chosen to stay [2] - Despite competitive salary pressures, OpenAI does not engage in salary wars, focusing instead on a shared vision of achieving AGI as the key to retaining talent [2] Group 2: Resource Allocation and Project Management - OpenAI is managing approximately 300 concurrent research projects, with a focus on prioritizing those that are most likely to advance AGI, emphasizing exploratory research over following trends [3] - The company maintains a transparent and strict resource allocation process, allowing for secondary projects but clearly defining their subordinate status to ensure efficiency [3] Group 3: Competitive Landscape and Model Development - OpenAI monitors competitor releases, such as Google's Gemini 3, but maintains its own development pace, emphasizing confidence in internal progress rather than reacting to external pressures [4] - The company is refocusing on pre-training capabilities, which had been deprioritized, believing there is still significant potential for improvement in this area [5] Group 4: AGI Development and Future Goals - Mark Chen believes that significant changes in AI capabilities will occur within the next two years, with goals set for AI to participate in research processes and eventually conduct end-to-end research autonomously [7] - The demand for computational power is expected to remain high, with Chen stating that even a threefold increase in resources would be quickly utilized [8] Group 5: Hardware Development and Future Interactions - OpenAI is collaborating with designer Jony Ive to develop next-generation AI hardware that aims to enhance user interaction by enabling continuous learning and memory capabilities [9] - The goal is to evolve AI from a passive assistant to a more intelligent entity that can remember user interactions and improve over time [9] Group 6: Strategic Focus Amid Competition - In response to the emergence of open-source models like DeepSeek, OpenAI emphasizes the importance of maintaining its research pace and innovation focus, rather than being swayed by competitive pressures [10]
OpenAI首席研究员Mark Chen长访谈:小扎亲手端汤来公司挖人,气得我们端着汤去了Meta
量子位· 2025-12-03 00:11
Core Insights - The interview with OpenAI's Chief Research Officer Mark Chen reveals the competitive landscape in AI talent acquisition, particularly between OpenAI and Meta, highlighting the lengths to which companies will go to attract top talent, including sending homemade soup [4][9][11] - OpenAI maintains a strong focus on AI research, with a core team of approximately 500 people and around 300 ongoing projects, emphasizing the importance of pre-training and the development of next-generation models [4][20][27] - Mark Chen expresses confidence in OpenAI's ability to compete with Google's Gemini 3, stating that internal models have already matched its performance and that further advancements are imminent [4][26][119] Talent Acquisition and Competition - Meta's aggressive recruitment strategy has led to a "soup war," where both companies are trying to entice talent through unconventional means [4][11] - Despite Meta's efforts, many OpenAI employees have chosen to stay, indicating a strong belief in OpenAI's mission and future [10][14] - The competition for talent is intense, with companies recognizing the necessity of attracting the best individuals to build effective AI labs [9][10] Research Focus and Model Development - OpenAI's research strategy prioritizes exploratory research over merely replicating existing benchmarks, aiming to discover new paradigms in AI [22][27] - The company has invested heavily in pre-training, believing it still holds significant potential, contrary to claims that scaling has reached its limits [118][119] - Mark Chen emphasizes the importance of maintaining a clear focus on core research priorities and effectively communicating these to the team [24][20] Response to Competitors - OpenAI aims to avoid being reactive to competitors, focusing instead on long-term research goals and breakthroughs rather than short-term updates [26][28] - The company has already developed models that can compete with Gemini 3, showcasing its confidence in upcoming releases [34][119] - Mark Chen highlights the significance of reasoning capabilities in language models, which OpenAI has been developing for over two years [26][116] Company Culture and Management - OpenAI's culture remains rooted in its original mission as a pure AI research organization, despite its growth and the introduction of product lines [27][28] - Mark Chen's management style emphasizes collaboration and open communication, fostering a strong sense of community among researchers [101][104] - The company has navigated internal challenges, including leadership changes, by promoting unity and a shared vision among its team [98][102]