后训练
Search documents
腾讯混元3年变形始末
第一财经· 2026-01-12 03:00
以下文章来源于新皮层NewNewThing ,作者陆彦君、吴洋洋 新皮层NewNewThing . 关注AI,提供洞察。 2026.01. 12 本文字数:7212,阅读时长大约12分钟 作者 | 新皮层NewNewThing 陆彦君 吴洋洋 2025年11月下旬,大学毕业生林枫在深圳参加了腾讯青云计划闭门会。活动采取定向邀请制,为期 两天,会议内容除了邮轮观光、参观腾讯总部,还有一个环节是部门见面会——姚顺雨在现场。 这次见面会两个小时左右,姚顺雨是开场发言者,他讲了大概只有20分钟,但富有雄心。 「他说混元的目标是对标全球最顶尖的一批大模型。」 林枫对第一财经「新皮层」说。 「腾讯只看来自DeepSeek、月之暗面、字节和阿里这4家基座模型团队的候选人,其他公司是不看 的。」接近腾讯招聘的人士 陈立峰对「新皮层」说。 他对「新皮层」称,2025年年中,字节跳动曾通过发放「豆包虚拟股」激励员工,相当于为其大模 型团队涨薪。但正是在这轮股权激励期间,部分字节豆包员工乘势转身加入了腾讯混元,原本年薪总 包约为250万至300万元的字节跳动员工,加入混元后能拿到年薪300万以上的Offer。 林枫对姚顺雨印象深刻 ...
倒反天罡,Gemini Flash表现超越Pro,“帕累托前沿已经反转了”
3 6 Ke· 2025-12-22 10:12
Core Insights - Gemini 3 Flash has outperformed its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various performance metrics, achieving a score of 78% in the SWE-Bench Verified test, surpassing the Pro's score of 76.2% [1][5][6] - The Flash version demonstrates significant improvements in programming capabilities and multimodal reasoning, with a score of 99.7% in the AIME 2025 mathematics benchmark when code execution is included [5][6] - Flash's performance in the challenging Humanity's Last Exam test is competitive, scoring 33.7% without tools, closely trailing the Pro's 37.5% [5][6] Performance Metrics - In the SWE-Bench Verified test, Gemini 3 Flash scored 78%, while Gemini 3 Pro scored 76.2% [5][6] - In the AIME 2025 mathematics benchmark, Flash scored 99.7% with code execution, while Pro scored 100% [6] - Flash achieved 33.7% in the Humanity's Last Exam, compared to Pro's 37.5% [5][6] Cost and Efficiency - Gemini 3 Flash has a competitive pricing structure, with input costs at $0.50 per million tokens and output costs at $3.00 per million tokens, which is higher than Gemini 2.5 Flash but justified by its performance [7] - Flash's inference speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption [7] Strategic Insights - Google’s core team views the Pro model as a means to distill the capabilities of Flash, emphasizing that Flash's smaller size and efficiency are crucial for users [11][12] - The development team believes that the traditional scaling law is evolving, with a shift from merely increasing parameters to enhancing inference capabilities [12][14] - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, suggesting that smaller, more efficient models can outperform larger ones [13][14]
深度|OpenAI最高职级华人Mark Chen独家回应与Gemini竞争、Meta人才战及AI核心策略
Z Potentials· 2025-12-20 04:03
Z Highlights Ashlee Vance , Core Memory 播客主持人,科技领域资深记者; Mark Chen——OpenAI 首席研究官,深耕 AGI 研究与 AI 对齐领域,主导多项核心模型研 发, AI 行业人才争夺白热化、 Gemini 3 发布后,围绕 OpenAI 研究布局与 AGI 未来展开对话。访谈时间: 2025 年 12 月 2 日。 人才攻防战: Meta 的激进招募与 OpenAI 的底气 Ashlee Vance : Alex Wayne 以前是搞数学的,对吧?你们应该认识他。 Mark Chen : 我和他见过几次,但不算太熟。 Ashlee Vance : 他为什么会离开呢? Ash lee Va nce : 人才争夺战备受关注, Meta 的动作相当激进。这场拉锯战具体是什么样的?我们现在处于什么阶段? Mark Chen : 确实存在一批核心人才,业内几乎都清楚他们是谁。很多公司都意识到,打造顶尖 AI 实验室的关键要素之一就是招揽最优秀的人才。 Meta 大力推行这一策略并不意外。我们并未坐视不管,我想从 OpenAI 的角度来讲讲这段经历。媒体上有很多 ...
RL是「点金石」还是「挖掘机」?CMU 用可控实验给出答案
机器之心· 2025-12-15 01:44
机器之心报道 机器之心编辑部 近期,强化学习(RL)技术在提升语言模型的推理能力方面取得了显著成效。 然而, 后训练究竟是真正扩展了模型的推理能力,还是仅仅挖掘了预训练中已有的潜力? 目前尚不明确。 一个核心挑战在于现代训练流程缺乏可控性:大规模预训练语料库不够透明,中期训练往往缺乏充分研究,且 RL 目标函数与未知的先验知识之间存在复杂 的交互作用。 为了回答这个问题,来自卡耐基梅隆大学(CMU)的研究者通过构建 基于 GSM-Infinite 的可控合成数据框架 ,在完全解耦的环境下,定量分析了预训 练、Mid-training(中期训练/CPT)和 RL 三者对模型推理泛化能力的因果影响。旨在剥离并独立分析预训练、中期训练以及基于 RL 的后训练各自的因 果贡献。 https://x.com/xiangyue96/status/1998488030836044112 研究者从两个维度对模型进行评估:针对更复杂组合的外推泛化能力,以及跨越不同表层语境的情境泛化能力。利用该框架,研究者调和了关于 RL 有效性 的不同观点。 研究表明: 仅当预训练留有足够提升空间,且 RL 数据针对模型的能力边界(即那些虽具 ...
喝点VC|YC对谈Anthropic预训练负责人:预训练团队也要考虑推理问题,如何平衡预训练和后训练仍在早期探索阶段
Z Potentials· 2025-10-16 03:03
Core Insights - The article discusses the evolution of pre-training in AI, emphasizing its critical role in enhancing model performance through scaling laws and effective data utilization [5][8][9] - Nick Joseph, head of pre-training at Anthropic, shares insights on the challenges and strategies in AI model development, particularly focusing on computational resources and alignment with human goals [2][3][4] Pre-training Fundamentals - Pre-training is centered around minimizing the loss function, which is the primary objective in AI model training [5] - The concept of "scaling laws" indicates that increasing computational power, data volume, or model parameters leads to predictable improvements in model performance [9][26] Historical Context and Evolution - Joseph's background includes significant roles at Vicarious and OpenAI, where he contributed to AI safety and model scaling [2][3][7] - The transition from theoretical discussions on AI safety to practical applications in model training reflects the industry's maturation [6][7] Technical Challenges and Infrastructure - The article highlights the engineering challenges faced in distributed training, including optimizing hardware utilization and managing complex systems [12][18][28] - Early infrastructure at Anthropic was limited but evolved to support large-scale model training, leveraging cloud services for computational needs [16][17] Data Utilization and Quality - The availability of high-quality data remains a concern, with ongoing debates about data saturation and the potential for overfitting on AI-generated content [35][36][44] - Joseph emphasizes the importance of balancing data quality and quantity, noting that while data is abundant, its utility for training models is critical [35][37] Future Directions and Paradigm Shifts - The conversation touches on the potential for paradigm shifts in AI, particularly the integration of reinforcement learning and the need for innovative approaches to achieve general intelligence [62][63] - Joseph expresses concern over the emergence of difficult-to-diagnose bugs in complex systems, which could hinder progress in AI development [63][66] Collaboration and Team Dynamics - The collaborative nature of teams at Anthropic is highlighted, with a focus on integrating diverse expertise to tackle engineering challenges [67][68] - The article suggests that practical engineering skills are increasingly valued over purely theoretical knowledge in the AI field [68][69] Implications for Startups and Innovation - Opportunities for startups are identified in areas that can leverage advancements in AI models, particularly in practical applications that enhance user experience [76] - The need for solutions to improve chip reliability and team management is noted as a potential area for entrepreneurial ventures [77]
黄仁勋最新对话直面争议,并称中国科技仅慢“纳秒”而已
聪明投资者· 2025-09-29 07:04
Core Viewpoint - The discussion emphasizes the exponential growth potential of AI, particularly in reasoning capabilities, which is expected to be a billion-fold increase, marking the onset of a new industrial revolution [8][3]. Group 1: AI Infrastructure and Investment - NVIDIA's investment in OpenAI is seen as a strategic bet on a future giant, with expectations that OpenAI could become a trillion-dollar company [13][14]. - The projected annual capital expenditure for AI infrastructure could reach $5 trillion globally, reflecting the immense growth potential in this sector [5][32]. - NVIDIA's equity investments are not tied to procurement but are viewed as opportunities to invest in future leaders [51][53]. Group 2: AI Evolution and Market Dynamics - The transition from general computing to accelerated computing and AI is inevitable, with traditional CPU-based systems being replaced by GPU-driven infrastructures [23][25]. - The AI market is expected to grow significantly, with estimates suggesting AI-related revenues could reach $1 trillion by 2030 [39][21]. - The integration of AI into various applications, such as search engines and recommendation systems, is driving demand for advanced computing capabilities [25][40]. Group 3: Competitive Landscape and Barriers - NVIDIA's competitive edge lies in its ability to execute extreme collaborative design, optimizing models, algorithms, systems, and chips simultaneously [6][64]. - The barriers to entry in the AI infrastructure market are increasing due to the high costs associated with chip production and the need for extensive collaboration [71][70]. - Trust in NVIDIA's delivery capabilities is crucial for clients to commit to large-scale orders, reinforcing its market position [74][72]. Group 4: Future Outlook and Technological Integration - The future of AI is envisioned to include the integration of robotics and AI, leading to personal AI companions for individuals [106][105]. - The potential for AI to enhance human intelligence and productivity is significant, with projections indicating that AI could contribute up to $50 trillion to global GDP [29][30]. - The rapid evolution of AI technologies necessitates continuous innovation and adaptation within the industry [61][62].
GPT-5 为啥不 “胡说” 了?OpenAI 新论文讲透了
腾讯研究院· 2025-09-12 08:58
Core Viewpoint - The article discusses the advancements and challenges of OpenAI's GPT-5, particularly focusing on the significant reduction in hallucination rates compared to previous models, while also highlighting the underlying mechanisms and implications of these changes [5][6][25]. Group 1: Hallucination Rates and Mechanisms - GPT-5 has a hallucination rate that is approximately 45% lower than GPT-4 and about 80% lower than OpenAI's earlier models [6]. - The reduction in hallucination rates is attributed to enhanced reinforcement learning techniques that allow models to refine their reasoning processes and recognize their errors [8][9]. - The paper published by OpenAI indicates that hallucinations are an inevitable byproduct of the statistical learning nature of language models, making it more challenging to generate reliable information than to assess its reliability [12][16]. Group 2: Theoretical Framework - OpenAI introduces a theoretical "Is-It-Valid" (IIV) judgment mechanism that determines the validity of generated sentences based on their internal probabilities [13]. - The model's tendency to generate plausible-sounding but incorrect information is exacerbated by data sparsity, complexity, and noise in training data [14][16]. - The mathematical conclusion presented in the paper suggests that the error rate of generative models is at least double that of the IIV judgment errors, indicating a compounding effect of judgment mistakes on hallucinations [15][16]. Group 3: Post-Training Challenges - Post-training processes have not effectively mitigated hallucinations, as current evaluation metrics tend to reward models for providing confident but potentially incorrect answers [18][24]. - The article critiques the binary scoring systems used in mainstream AI evaluations, which penalize uncertainty and discourage models from expressing "I don't know" [21][24]. - The reinforcement learning processes that utilize binary reward paths may inadvertently promote overconfidence in models, leading to increased hallucination rates [27][29]. Group 4: Future Directions and Solutions - The article suggests that introducing a penalty-based scoring mechanism during post-training could help models better calibrate their confidence levels and reduce hallucinations [33]. - A shift from a score-optimization focus to a truth-oriented approach is proposed as a potential solution to the hallucination problem [34].
每日AI之声
2025-07-16 06:13
Summary of Conference Call Records Industry Overview - The global toy industry is expected to experience significant growth, driven by AI innovations, with projections indicating a market size of approximately $600 billion by 2023, reflecting a compound annual growth rate (CAGR) exceeding 19% from a base of $18 billion in 2024 [1][2][3] - In China, AI toy sales have shown explosive growth, with some companies achieving daily sales exceeding 500,000 yuan in January 2025 [1] Core Insights and Arguments - **Technological Maturity**: The technology behind AI toys is considered mature, enabling features such as emotional responses and educational integration, which parents are willing to pay a premium for [2][3] - **Educational Value**: AI toys are increasingly being integrated into educational contexts, enhancing children's logical thinking through interactive programming [2] - **Emotional Economy**: The rise of the emotional economy is a key driver for the growth of AI toys, as they provide companionship and emotional engagement [2][3] - **Market Dynamics**: The AI toy market does not require high precision in model outputs, allowing for broader accessibility and faster development cycles [3] Company-Specific Developments - A company has launched several AI-driven products, including the "Xiyangyang" AI doll, which features interactive modes such as chatting and Bluetooth connectivity, indicating rapid growth in AI-enabled toy offerings [4] - Another company, Shifeng Culture, has been active in the toy industry for over 30 years and is focusing on integrating AI with established IPs like Disney and Conan to enhance product offerings [5] Additional Important Points - The AI toy sector in China is poised for rapid expansion, driven by technological advancements and consumer demand [1][5] - The integration of AI in toys is expected to lead to increased complexity in product offerings, including enhanced interaction capabilities through video and voice technologies [27][28] - The overall toy ecosystem is likely to evolve, with a shift towards more sophisticated AI applications that enhance user interaction and engagement [27][28] Conclusion - The AI toy industry is on the brink of a significant transformation, fueled by technological advancements and changing consumer preferences, particularly in the educational and emotional engagement sectors. Companies that effectively leverage these trends are likely to see substantial growth in the coming years [1][2][3][5][27][28]
娃哈哈宗馥莉被起诉,原告自称是同父异母弟妹|首席资讯日报
首席商业评论· 2025-07-14 04:10
Group 1 - The core viewpoint of the article emphasizes the ongoing positive trend in the A-share market, with a focus on mid-year performance reports and the theme of "anti-involution" [2][3] - China Shenhua reported a coal sales volume of 204.9 million tons in the first half of the year, reflecting a year-on-year decrease of 10.9% [8] - The railway sector completed fixed asset investments of 355.9 billion yuan in the first half of the year, showing a year-on-year growth of 5.5% [9] Group 2 - The article discusses the ongoing family trust dispute involving Wahaha's chairperson, Zong Fuli, who is being sued by her half-siblings for rights to a trust fund valued at 700 million USD each [5][6][7] - The white feather meat duck industry is undergoing a significant capacity reduction, with approximately 9 million breeding ducks eliminated, and an expectation that 30% of breeding duck enterprises may exit the market [11] - Perplexity's CEO indicated plans to utilize the Kimi K2 model for further training, highlighting advancements in AI capabilities [12]
迎接AI——理性看待变革,积极布局未来
创业邦· 2025-07-07 10:27
Core Viewpoint - The discussion emphasizes the importance of integrating AI technology with business operations, focusing on long-term strategic value rather than short-term gains [1][19][29]. Group 1: AI Technology Development - AI has reached a critical intersection of technology and product, where understanding its limitations and capabilities is essential for practical applications [5][6]. - The industry consensus is that the core capabilities of models stem from pre-training rather than post-training, highlighting the need for high-quality training data [6][7]. - AI tools are powerful but come with uncertainties, necessitating a careful approach to their integration into business processes [5][6]. Group 2: Practical Applications of AI - APUS has successfully implemented AI in coding, design, and healthcare, significantly improving efficiency and reducing the need for large teams [11][12][14]. - The company has developed proprietary models for coding and healthcare diagnostics, demonstrating the potential of AI to enhance productivity and service delivery [11][14][15]. - AI's role in content creation has transformed traditional processes, allowing for rapid generation of marketing materials and interactive products [12][13][14]. Group 3: Strategic Considerations for AI Implementation - Companies often misjudge the short-term capabilities of AI while underestimating its long-term potential, leading to misguided expectations [20][21]. - A structured approach to defining AI applications is crucial, starting from understanding the business's needs and aligning AI capabilities accordingly [26][27]. - The need for skilled project leaders who understand both AI and business operations is highlighted as a key factor for successful AI integration [22][23]. Group 4: Recommendations for CEOs - CEOs should clearly define the strategic value of AI within their organizations, ensuring that AI initiatives align with long-term business goals [26][27][28]. - Emphasizing the importance of cultural adaptation and understanding AI's operational principles can facilitate smoother integration into daily workflows [26][27]. - Companies must avoid focusing solely on technology and instead prioritize identifying relevant applications and the necessary data governance [27][28].