预训练 - filings, earnings calls, financial reports, news - Reportify

预训练

Search documents

腾讯混元3年变形始末

第一财经· 2026-01-12 03:00

以下文章来源于新皮层NewNewThing ，作者陆彦君、吴洋洋新皮层NewNewThing . 关注AI，提供洞察。 2026.01. 12 本文字数：7212，阅读时长大约12分钟作者 | 新皮层NewNewThing 陆彦君吴洋洋 2025年11月下旬，大学毕业生林枫在深圳参加了腾讯青云计划闭门会。活动采取定向邀请制，为期两天，会议内容除了邮轮观光、参观腾讯总部，还有一个环节是部门见面会——姚顺雨在现场。这次见面会两个小时左右，姚顺雨是开场发言者，他讲了大概只有20分钟，但富有雄心。「他说混元的目标是对标全球最顶尖的一批大模型。」林枫对第一财经「新皮层」说。「腾讯只看来自DeepSeek、月之暗面、字节和阿里这4家基座模型团队的候选人，其他公司是不看的。」接近腾讯招聘的人士陈立峰对「新皮层」说。他对「新皮层」称，2025年年中，字节跳动曾通过发放「豆包虚拟股」激励员工，相当于为其大模型团队涨薪。但正是在这轮股权激励期间，部分字节豆包员工乘势转身加入了腾讯混元，原本年薪总包约为250万至300万元的字节跳动员工，加入混元后能拿到年薪300万以上的Offer。林枫对姚顺雨印象深刻 ...

TENCENT(HK:00700)

Artificial Intelligence

混元大模型

Artificial Intelligence

混元大模型

Hinton加入Scaling Law论战，他不站学生Ilya

量子位· 2026-01-01 02:13

一水发自凹非寺量子位 | 公众号 QbitAI 我并不认为Scaling Law已经完全结束了。正当学生Ilya为Scaling Law"泼下冷水"时，他的老师、AI教父Geoffrey Hinton却毅然发表了上述截然相反的观点。这一场面一出，我们不禁回想起了两件有趣的事。一是Ilya几乎从学生时代起就坚信Scaling Law，不仅一抓住机会就向身边人安利，而且还把这套理念带进了OpenAI。可以说，Ilya算是Scaling Law最初的拥趸者。二是Hinton后来在回顾和Ilya的相处时，曾大肆夸赞Ilya"具有惊人的直觉"，包括在Scaling Law这件事上，Hinton曾坦言：当时的我错了，而Ilya基本上是对的。比如Transformer确实是一种创新想法，但实际上起作用的还是规模，数据的规模和计算的规模。但是现在，这对师徒的态度却来了个惊天大反转。所以，这中间到底发生了什么？ Scaling Law不死派：Hinton、哈萨比斯其中，最大的挑战无疑是数据缺失问题。大部分高价值数据都锁在公司内部，免费互联网数据已基本耗尽。而这个问题将由AI自行解决，即模型通过推 ...

有300亿美元也未必“再造GPT-4”？NUS尤洋最新长文：拆穿AI增长瓶颈的真相

量子位· 2025-12-31 03:37

Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].

算力与智能

Transformer架构

Next - Token Prediction

算力与智能

Transformer架构

Next - Token Prediction

Dwarkesh最新播客：AI 进展年终总结

3 6 Ke· 2025-12-24 23:15

Core Insights - Dwarkesh's podcast features prominent AI figures Ilya Sutskever and Andrej Karpathy, indicating his significant standing in the AI community [1] - The article summarizes Dwarkesh's views on AI advancements, particularly regarding the timeline for achieving AGI [1] Group 1: AI Development and AGI Timeline - The focus on "mid-training" using reinforcement learning is seen as evidence that AGI is still far off, as it suggests models lack strong generalization capabilities [3][16] - The idea of pre-trained skills is questioned, as human labor's value lies in the ability to flexibly acquire new skills without heavy training costs [4][24] - AI's economic diffusion lag is viewed as an excuse for insufficient capabilities, rather than a natural delay in technology adoption [27][28] Group 2: AI Capabilities and Limitations - AI models currently lack the ability to fully automate even simple tasks, indicating a significant gap in their capabilities compared to human workers [25][30] - The adjustment of standards for AI capabilities is acknowledged as reasonable, reflecting a deeper understanding of intelligence and labor complexity [31] - The scaling laws observed in pre-training do not necessarily apply to reinforcement learning, with some studies suggesting a need for a million-fold increase in computational power to achieve similar advancements [10][33] Group 3: Future of AI and Continuous Learning - Continuous learning is anticipated to be a major driver of model capability enhancement post-AGI, with expectations for preliminary features to emerge within a year [13][40] - Achieving human-level continuous learning may take an additional 5 to 10 years, indicating that breakthroughs will not lead to immediate dominance in the field [14][41] - The potential for an explosion in intelligence once models reach human-level capabilities is highlighted, emphasizing the importance of ongoing learning and adaptation [36] Group 4: Economic Implications and Workforce Integration - The integration of AI labor into enterprises is expected to be easier than hiring human workers, as AI can be replicated without the complexities of human recruitment [29] - The current revenue gap between AI models and human knowledge workers underscores the distance AI still has to cover in terms of capability [30] - The article suggests that if AI models truly reached AGI levels, their economic impact would be profound, with businesses willing to invest significantly in AI labor [29]

深度｜OpenAI最高职级华人Mark Chen独家回应与Gemini竞争、Meta人才战及AI核心策略

Z Potentials· 2025-12-20 04:03

Z Highlights Ashlee Vance ， Core Memory 播客主持人，科技领域资深记者； Mark Chen——OpenAI 首席研究官，深耕 AGI 研究与 AI 对齐领域，主导多项核心模型研发， AI 行业人才争夺白热化、 Gemini 3 发布后，围绕 OpenAI 研究布局与 AGI 未来展开对话。访谈时间： 2025 年 12 月 2 日。人才攻防战： Meta 的激进招募与 OpenAI 的底气 Ashlee Vance ： Alex Wayne 以前是搞数学的，对吧？你们应该认识他。 Mark Chen ：我和他见过几次，但不算太熟。 Ashlee Vance ：他为什么会离开呢？ Ash lee Va nce ：人才争夺战备受关注， Meta 的动作相当激进。这场拉锯战具体是什么样的？我们现在处于什么阶段？ Mark Chen ：确实存在一批核心人才，业内几乎都清楚他们是谁。很多公司都意识到，打造顶尖 AI 实验室的关键要素之一就是招揽最优秀的人才。 Meta 大力推行这一策略并不意外。我们并未坐视不管，我想从 OpenAI 的角度来讲讲这段经历。媒体上有很多 ...

RL是「点金石」还是「挖掘机」？CMU 用可控实验给出答案

机器之心· 2025-12-15 01:44

机器之心报道机器之心编辑部近期，强化学习（RL）技术在提升语言模型的推理能力方面取得了显著成效。然而，后训练究竟是真正扩展了模型的推理能力，还是仅仅挖掘了预训练中已有的潜力？目前尚不明确。一个核心挑战在于现代训练流程缺乏可控性：大规模预训练语料库不够透明，中期训练往往缺乏充分研究，且 RL 目标函数与未知的先验知识之间存在复杂的交互作用。为了回答这个问题，来自卡耐基梅隆大学（CMU）的研究者通过构建基于 GSM-Infinite 的可控合成数据框架，在完全解耦的环境下，定量分析了预训练、Mid-training（中期训练/CPT）和 RL 三者对模型推理泛化能力的因果影响。旨在剥离并独立分析预训练、中期训练以及基于 RL 的后训练各自的因果贡献。 https://x.com/xiangyue96/status/1998488030836044112 研究者从两个维度对模型进行评估：针对更复杂组合的外推泛化能力，以及跨越不同表层语境的情境泛化能力。利用该框架，研究者调和了关于 RL 有效性的不同观点。研究表明：仅当预训练留有足够提升空间，且 RL 数据针对模型的能力边界（即那些虽具 ...

强化学习（RL）

推理语言模型

强化学习（RL）

推理语言模型

GPT-5.2提前泄露？今夜，OpenAI要拿Gemini 3祭天

3 6 Ke· 2025-12-11 08:17

【导读】刚刚，GPT-5.2突袭上线Cursor，专狙Gemini 3！眼看OpenAI和谷歌的大战一触即发，网友狂呼：今晚提前过圣诞！就在今夜，OpenAI或将打响复仇之战。全体网友枕戈待旦，GPT-5.2随时上线！目前，已有火眼金睛的网友发现了GPT-5.2的蛛丝马迹。开发者社区流传的截图显示，Cursor的模型下拉菜单中，赫然出现了gpt-5.2和gpt-5.2-thinking的选项。 GPT-5.2的首战场居然选在了Cursor IDE，而非ChatGPT网页端。这也意味着，或许OpenAI已经明白：编程不仅是AI的杀手级应用，也是最能体现模型推理能力的领域。总之，可以预感到，谷歌和OpenAI之间的一场火花四溅的大战，马上就要打响。网友激动狂呼：今天的圣诞节，要提前来了！根据泄露的「大蒜（Project Garlic）」文件及Cursor社区的反馈，GPT-5.2是一款经过彻底重构的专用模型。是的，GPT-5.2这一承载着OpenAI生死存亡使命的模型，绝非GPT-5的简单微调版。根据OpenAI首席研究官MarkChen的说法，GPT-5.2在编程和逻辑推理任务上的表现，已经 ...

算力零和博弈

算力零和博弈

AI大家说 | 重磅嘉宾齐聚，近期Dwarkesh Podcast都聊了些什么？

红杉汇· 2025-12-11 00:04

要点速览： 2025年，一个叫Dwarkesh Podcast的播客成为了AI行业内获取一手信息最重要的渠道之一。甚至可以说，它已经成了硅谷AI技术圈的必看节目。从Satya Nadella到Andrej Karpathy，再到Ilya Sutskever，这些平时很难约到的行业核心人物，都选择在这里进行长时间的深度对话。本期，我们将为大家分享其中最新、最受关注的几场播客及核心观点。 Ilya Sutskever 前OpenAI首席科学家、计算机科学家、SSI创始人 ■ 洞察1：那种无脑堆算力的「暴力美学」时代，其实已经翻篇了。过去这五年，大家都在喊Scaling Law，好像只要GPU够多、数据够大，把整个互联网喂进去，AGI就自动产出来了。但Ilya直接泼了盆冷水。他说Pre-training （预训练）已经开始式微，现在数据快用光了，到了后面这一步（RL和后训练），光靠"大"没用了。现在又回到了2012年之前那种需要"拼品味、拼直觉"的手搓时代（Age of Research）。 ■ 洞察2：「情绪」不是人类的累赘，而是进化给人类的礼物。我们通常觉得AI是理性的，人类是感性 ...

Microsoft(US:MSFT)

Dwarkesh Podcast

Dwarkesh Podcast

OpenAI首席研究员Mark Chen长访谈：小扎亲手端汤来公司挖人，气得我们端着汤去了Meta

3 6 Ke· 2025-12-04 02:58

救大命，OpenAI首席研究官Mark Chen最新访谈，信息量有点大呀。不管是OpenAI的，还是自己个儿的，又或者是同事的，主打一个"我都能聊聊"。比如：网友纷纷表示，这次访谈确实让人耳目一新，还有不少人在转发Mark Chen的观点。爆料Meta抢人大战私下已经升级成送汤大战了，真能喝的那种汤，小扎熬了亲自送到OpenAI研究员嘴边。OpenAI反击也送汤。 Mark Chen、Scott Gray(OpenAI专门负责GPU内核优化的神秘狠人)等经常三五围坐，打扑克牌。其本质被解释为是概率与期望值的博弈。 OpenAI核心研究团队规模大概500人，公司内大概有300个项目。 Mark Chen表示OpenAI本质上仍然是一家纯AI研究公司。 Gemini 3发布后每个人都会用自己的方式去试探新模型，有个"42问题"从没见过哪个语言模型能真正把它完全做出来。 OpenAI"宫斗"，Mark Chen如何让研究员们统一意见、促成那封让Sam回归的请愿信也被聊了出来。透露过去半年，一直专注在预训练上，在预训练方面，有信心轻松与Gemini 3正面对决。表示内部已经有性能达到Gemini 3的模型 ...

通用人工智能（AGI）

强化学习（RL）

推理（Reasoning）

通用人工智能（AGI）

强化学习（RL）

推理（Reasoning）

聊DeepSeek、聊AI硬件、聊竞争对手，OpenAI首席研究官专访信息密度有点大

3 6 Ke· 2025-12-03 07:46

Core Insights - OpenAI's Chief Research Officer Mark Chen discussed the company's strategic vision amid intense AI competition and technological advancements, addressing concerns about talent retention and the pursuit of AGI [1] Group 1: Talent Acquisition and Retention - OpenAI faces aggressive talent poaching from competitors like Meta, which reportedly invests billions annually in recruitment efforts, yet most OpenAI employees have chosen to stay [2] - Despite competitive salary pressures, OpenAI does not engage in salary wars, focusing instead on a shared vision of achieving AGI as the key to retaining talent [2] Group 2: Resource Allocation and Project Management - OpenAI is managing approximately 300 concurrent research projects, with a focus on prioritizing those that are most likely to advance AGI, emphasizing exploratory research over following trends [3] - The company maintains a transparent and strict resource allocation process, allowing for secondary projects but clearly defining their subordinate status to ensure efficiency [3] Group 3: Competitive Landscape and Model Development - OpenAI monitors competitor releases, such as Google's Gemini 3, but maintains its own development pace, emphasizing confidence in internal progress rather than reacting to external pressures [4] - The company is refocusing on pre-training capabilities, which had been deprioritized, believing there is still significant potential for improvement in this area [5] Group 4: AGI Development and Future Goals - Mark Chen believes that significant changes in AI capabilities will occur within the next two years, with goals set for AI to participate in research processes and eventually conduct end-to-end research autonomously [7] - The demand for computational power is expected to remain high, with Chen stating that even a threefold increase in resources would be quickly utilized [8] Group 5: Hardware Development and Future Interactions - OpenAI is collaborating with designer Jony Ive to develop next-generation AI hardware that aims to enhance user interaction by enabling continuous learning and memory capabilities [9] - The goal is to evolve AI from a passive assistant to a more intelligent entity that can remember user interactions and improve over time [9] Group 6: Strategic Focus Amid Competition - In response to the emergence of open-source models like DeepSeek, OpenAI emphasizes the importance of maintaining its research pace and innovation focus, rather than being swayed by competitive pressures [10]

Seek .(US:SKLTY)