Workflow
预训练
icon
Search documents
技术指数级发展,可怕的是全世界竟无察觉
虎嗅APP· 2026-02-18 09:47
本文来自微信公众号: 腾讯科技 ,作者:晓静,编辑:徐青阳,原文标题:《技术指数级发展,可 怕的是全世界竟无察觉|Anthropic CEO最新访谈》,题图来自:视觉中国 "我90%确信,2035年前人类将迎来'数据中心里的天才国度'——甚至可能就在一两年内。" Anthropic CEO达里奥·阿莫代伊 (Dario Amodei) 说出这句话时,语气平静得像在预言明天的天 气。 但真正让他抓狂的不是技术进展太快,而是全世界竟然毫无察觉。在接受美国知名博客主持人德瓦克 什·帕特尔 (Dwarkesh Patel) 近150分钟的深度专访中,阿莫代伊反复强调一个观点: 我们离AGI 的终点比任何人想象的都要近,而公众还在讨论那些老掉牙的政治话题。 帕特尔:现在"规模扩展"的假设到底是什么?预训练的扩展定律大家都懂,但强化学习扩展好像没有 公开的规律可循。 阿莫代伊:我现在的假设跟2017年写《大计算块假说》 (The Big Blob of Compute Hypothesis) 时是一样的,与图灵奖得主、强化学习之父里奇·萨顿 (Rich Sutton) 的《苦涩的教训》 (The Bitter Lesso ...
Meta内部备忘录:全新Avocado成公司迄今“最强能力”大模型
Xin Lang Cai Jing· 2026-02-05 10:08
Core Insights - Meta Platforms is optimistic about its new AI team and the upcoming launch of its core large model, Avocado, which has completed pre-training and is described as the company's most capable pre-trained foundational model to date [2][7] - The performance of Avocado has surpassed that of the best current open-source foundational models, and it matches top post-trained models in knowledge retention, visual perception, and multilingual capabilities, despite not yet completing the post-training phase [2][7] Group 1 - The internal memo indicates that Meta's AI model progress is optimistic but remains untested in the external environment, raising potential risks for the company [3][8] - Meta's previous AI model, Llama 4, underperformed, leading to a delay in its release and disappointment among developers regarding its actual performance [3][8] Group 2 - The setbacks in AI development prompted a significant restructuring of Meta's AI business, including the acquisition of Scale AI for $14.3 billion and the establishment of the Meta Superintelligence Labs led by Alexandr Wang [9] - Meta plans to increase its capital expenditure on AI, including computing costs, by approximately 73% in 2026, projecting a total of $115 billion to $135 billion [9] Group 3 - Avocado has demonstrated significant efficiency improvements, achieving a tenfold increase in computational efficiency compared to Maverick and over a hundredfold compared to Behemoth, which has not yet been released [4][9] - The efficiency gains are attributed to higher quality data acquisition, investment in model infrastructure, and the use of deterministic training methods, which are crucial for reducing energy consumption and costs in AI development [10] Group 4 - Recent public statements from Meta executives align with the positive tone of the internal memo, with CTO Andrew Bosworth highlighting similar efficiency improvements and CEO Mark Zuckerberg expressing confidence in the performance of upcoming models [5][10]
腾讯混元3年变形始末
第一财经· 2026-01-12 03:00
Core Viewpoint - Tencent is aggressively recruiting talent in the AI field, particularly for its large language model (LLM) project, "混元" (Hunyuan), aiming to compete with top global models. The company is experiencing a significant shift in its organizational structure and talent acquisition strategy to enhance its capabilities in AI development [10][20][23]. Group 1: Recruitment and Talent Acquisition - Tencent's "青云计划" (Qingyun Plan) targets top graduates for AI roles, directly competing with ByteDance's "Top Seed" program [10]. - The company is offering substantial salary increases, with some candidates seeing their compensation double upon joining Tencent from ByteDance [10][13]. - Key hires from Microsoft and other leading AI teams have been made to bolster Tencent's LLM capabilities, with a focus on candidates from specific high-profile companies [12][18]. Group 2: Leadership Changes and Organizational Structure - The appointment of Yao Shunyu as the chief AI scientist marks a pivotal change in Tencent's approach to its LLM project, granting him direct reporting lines to the company's president [20][21]. - Yao's leadership is expected to streamline decision-making and resource allocation, contrasting with the previous complex management structure [21][46]. - Organizational adjustments have been made to align with the demands of large model development, including the establishment of new departments focused on AI infrastructure and data [45][46]. Group 3: Competitive Landscape and Market Position - Tencent's late entry into the large model space has raised concerns about its competitive position, as it trails behind companies like OpenAI, Baidu, and ByteDance in model performance [23][24]. - The company is under pressure to deliver competitive models quickly, with industry insiders noting that its self-developed models have not been featured prominently in benchmark comparisons [23][24]. - The shift in focus towards LLMs is seen as a response to the urgent need for Tencent to catch up in the rapidly evolving AI landscape [23][47]. Group 4: Model Development Strategy - Yao Shunyu emphasizes a shift towards post-training and a more methodical approach to model updates, contrasting with the previous rapid release cycle [18]. - The upcoming "混元2.0" model, with 406 billion parameters, is anticipated to reflect Yao's influence, although it is unlikely to be entirely his work due to the typical training timelines [52]. - The strategy moving forward will likely involve leveraging proven methodologies from successful models in the industry to accelerate development [47][49].
Hinton加入Scaling Law论战,他不站学生Ilya
量子位· 2026-01-01 02:13
Core Viewpoint - The article discusses the ongoing debate surrounding the "Scaling Law" in AI, highlighting contrasting perspectives from key figures in the field, particularly Ilya Sutskever and Geoffrey Hinton, regarding the future and limitations of scaling AI models [1][8][21]. Group 1: Perspectives on Scaling Law - Ilya Sutskever expresses skepticism about the continued effectiveness of Scaling Law, suggesting that merely increasing model size may not yield significant improvements in AI performance [23][40]. - Geoffrey Hinton, on the other hand, maintains that Scaling Laws are still valid but face challenges, particularly due to data scarcity, which he believes can be addressed by AI generating its own training data [10][21]. - Demis Hassabis, CEO of DeepMind, supports Hinton's view, emphasizing the importance of scaling for achieving advanced AI systems and the potential for self-evolving AI through data generation [15][19]. Group 2: The Debate on Data and Model Scaling - The article outlines the historical context of Scaling Law, which posits that increasing model parameters, training data, and computational resources leads to predictable improvements in AI performance [26][27]. - Recent discussions have shifted towards concerns about data limitations, with Ilya arguing that the era of pre-training is coming to an end due to diminishing returns from scaling [32][41]. - Yann LeCun also shares skepticism about the assumption that more data and computational power will automatically lead to smarter AI, indicating a broader questioning of the Scaling Law's applicability [46][48]. Group 3: Future Directions and Research Focus - The article suggests that while current paradigms may still yield significant economic and social impacts, achieving Artificial General Intelligence (AGI) or Artificial Superintelligence (ASI) will likely require further research breakthroughs [53]. - There is a consensus among leading researchers that while AGI is not a distant fantasy, the nature and speed of necessary breakthroughs remain uncertain [53].
有300亿美元也未必“再造GPT-4”?NUS尤洋最新长文:拆穿AI增长瓶颈的真相
量子位· 2025-12-31 03:37
Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].
Dwarkesh最新播客:AI 进展年终总结
3 6 Ke· 2025-12-24 23:15
Core Insights - Dwarkesh's podcast features prominent AI figures Ilya Sutskever and Andrej Karpathy, indicating his significant standing in the AI community [1] - The article summarizes Dwarkesh's views on AI advancements, particularly regarding the timeline for achieving AGI [1] Group 1: AI Development and AGI Timeline - The focus on "mid-training" using reinforcement learning is seen as evidence that AGI is still far off, as it suggests models lack strong generalization capabilities [3][16] - The idea of pre-trained skills is questioned, as human labor's value lies in the ability to flexibly acquire new skills without heavy training costs [4][24] - AI's economic diffusion lag is viewed as an excuse for insufficient capabilities, rather than a natural delay in technology adoption [27][28] Group 2: AI Capabilities and Limitations - AI models currently lack the ability to fully automate even simple tasks, indicating a significant gap in their capabilities compared to human workers [25][30] - The adjustment of standards for AI capabilities is acknowledged as reasonable, reflecting a deeper understanding of intelligence and labor complexity [31] - The scaling laws observed in pre-training do not necessarily apply to reinforcement learning, with some studies suggesting a need for a million-fold increase in computational power to achieve similar advancements [10][33] Group 3: Future of AI and Continuous Learning - Continuous learning is anticipated to be a major driver of model capability enhancement post-AGI, with expectations for preliminary features to emerge within a year [13][40] - Achieving human-level continuous learning may take an additional 5 to 10 years, indicating that breakthroughs will not lead to immediate dominance in the field [14][41] - The potential for an explosion in intelligence once models reach human-level capabilities is highlighted, emphasizing the importance of ongoing learning and adaptation [36] Group 4: Economic Implications and Workforce Integration - The integration of AI labor into enterprises is expected to be easier than hiring human workers, as AI can be replicated without the complexities of human recruitment [29] - The current revenue gap between AI models and human knowledge workers underscores the distance AI still has to cover in terms of capability [30] - The article suggests that if AI models truly reached AGI levels, their economic impact would be profound, with businesses willing to invest significantly in AI labor [29]
深度|OpenAI最高职级华人Mark Chen独家回应与Gemini竞争、Meta人才战及AI核心策略
Z Potentials· 2025-12-20 04:03
Core Insights - The article discusses the intense talent competition in the AI industry, particularly between Meta and OpenAI, highlighting the aggressive recruitment strategies employed by Meta and the resilience of OpenAI in retaining its core talent despite lower compensation offers [3][6][10]. Talent Competition - Meta is actively recruiting top AI talent, with a budget of approximately $10 billion annually for talent acquisition, but many attempts to poach OpenAI employees have been unsuccessful [3][6]. - OpenAI emphasizes the importance of its vision and the belief in its potential for achieving AGI, which motivates employees to stay despite lower salaries compared to competitors [6][10]. Research Prioritization - OpenAI manages around 300 projects, with a structured approach to prioritize research efforts and allocate computational resources effectively [11][12]. - The company focuses on exploratory research rather than merely replicating existing results, which distinguishes it from other labs [12][14]. Long-term Research Philosophy - OpenAI maintains a long-term perspective in its research strategy, avoiding reactive competition with other companies and instead focusing on groundbreaking innovations that can shape the future of AI [14][15]. - The company believes that prioritizing research excellence will naturally lead to financial success, rather than being overly focused on immediate profitability [15][16]. Pre-training Breakthroughs - OpenAI is confident in its advancements in pre-training techniques, which are expected to significantly enhance model performance and competitiveness in the AI landscape [19][24]. - The collaboration between AI and human researchers is anticipated to yield remarkable results, as AI approaches problem-solving differently than humans [33]. Company Culture and Management - OpenAI fosters a culture of openness and collaboration, which is seen as essential for innovation and talent retention [66]. - The leadership at OpenAI emphasizes the importance of experience in management, with a focus on supporting and nurturing talent within the organization [58][65].
RL是「点金石」还是「挖掘机」?CMU 用可控实验给出答案
机器之心· 2025-12-15 01:44
Core Insights - Recent advancements in reinforcement learning (RL) technology have significantly improved the reasoning capabilities of language models [1] - The true extent to which post-training expands model reasoning capabilities or merely uncovers existing potential remains unclear [2] - A key challenge is the lack of controllability in modern training processes, with large-scale pre-training corpora being opaque and mid-training often insufficiently studied [2] Group 1: Research Framework and Methodology - Researchers from Carnegie Mellon University developed a controllable synthetic data framework based on GSM-Infinite to quantitatively analyze the causal impact of pre-training, mid-training, and RL on model reasoning generalization [2][5] - The framework allows for the decoupling of reasoning structure and surface context, enabling precise quantification of reasoning complexity and the examination of whether models genuinely learn reasoning logic or merely memorize specific text patterns [10][12] Group 2: Key Findings on Training Interactions - The effectiveness of RL depends on the "capability margin"; RL can only enhance reasoning abilities when tasks are challenging yet within the model's exploration range [16][17] - Pre-training utilized 10 billion tokens focusing on basic reasoning primitives, while mid-training serves as a bridge to align the model's internal representations for RL readiness [20] - A minimal amount of target context data during pre-training can significantly enhance cross-context generalization during RL post-training [22] Group 3: Training Efficiency and Performance - Mid-training is crucial for computational efficiency, with findings indicating that combining mid-training with RL yields better performance than using RL alone [26][27] - The introduction of process-level rewards can mitigate reward hacking and improve reasoning fidelity, particularly in complex reasoning tasks [29][30] Group 4: Practical Guidelines for Training - RL data design should target the model's capability margin, avoiding overly easy or difficult tasks [31] - Pre-training strategies must ensure at least 1% coverage of atomic capabilities in long-tail domains to provide interfaces for RL [32] - The allocation of computational resources should be dynamically adjusted based on task difficulty, with more RL for tackling challenging problems and more mid-training for stability [33]
GPT-5.2提前泄露?今夜,OpenAI要拿Gemini 3祭天
3 6 Ke· 2025-12-11 08:17
Core Insights - The imminent launch of GPT-5.2 by OpenAI is expected to intensify competition with Google's Gemini 3, with significant anticipation from the developer community [1][3][5]. Group 1: Product Development and Features - GPT-5.2 is reportedly designed to surpass Gemini 3, showcasing improvements in programming and logical reasoning tasks, as stated by OpenAI's Chief Research Officer Mark Chen [6][7]. - The model is said to execute longer tasks effectively, maintaining context across multiple files, which is a critical advantage against Gemini 3 [7]. - GPT-5.2 is not merely a minor update but a thoroughly restructured model aimed at countering Gemini 3's capabilities [6][8]. Group 2: Competitive Landscape - OpenAI's strategy appears to focus on enhancing its models in response to Google's advancements, particularly in pre-training methods that have proven effective for Gemini 3 [20][21]. - The competition is characterized as a zero-sum game, where OpenAI must prioritize resources towards models that generate direct revenue, such as GPT-5.2, over potentially less profitable ventures [29][34]. - The release of Gemini 3 has prompted OpenAI to reconsider its long-term goals, including the pursuit of AGI, in light of immediate survival and competitive pressures [25][28]. Group 3: Future Projections - There are indications that the Garlic model, which is internally referred to as GPT-5.2, may be released in early 2026, with expectations of significant improvements in coding and reasoning capabilities [10][11]. - OpenAI is also developing a larger model, codenamed Shallotpeat, which aims to address foundational issues in pre-training and enhance overall model performance [15][19].
AI大家说 | 重磅嘉宾齐聚,近期Dwarkesh Podcast都聊了些什么?
红杉汇· 2025-12-11 00:04
Core Insights - The podcast "Dwarkesh Podcast" has become a crucial source of information in the AI industry, featuring in-depth discussions with key figures like Satya Nadella, Ilya Sutskever, and Andrej Karpathy [2] Group 1: Insights from Ilya Sutskever - The era of blindly stacking computational power is over; the focus has shifted from scaling laws to a need for research and intuition in AI development [5] - Emotions are not a hindrance for humans but an evolutionary gift; AI lacks emotions, which limits its intelligence, and incorporating emotions may be essential for achieving true intelligence [6] - AGI should be viewed as a "15-year-old genius" with strong learning capabilities rather than an all-knowing entity [7] Group 2: Insights from Satya Nadella - Model vendors may face a "winner's curse" as models are interchangeable; Microsoft emphasizes integrating AI into applications like Excel to maintain a competitive edge [10] - GitHub is envisioned as the headquarters for future AI agents, focusing on managing multiple AI models working on code [11] - The SaaS model is evolving; future revenue may come from providing resources for AI agents rather than traditional user-based subscriptions [12][13] Group 3: Insights from Andrej Karpathy - The goal is not to create "animals" but rather "ghosts" of the internet, as current AI models lack physical intuition despite having vast knowledge [16] - Reinforcement learning (RL) is criticized for its inefficiency, as it reduces complex reasoning to a single reward signal, leading to issues like "hallucinations" in AI [17] - Future AGI may only require 1 billion parameters, separating memory from cognition to enhance efficiency [18] Group 4: Insights from Richard Sutton - Current LLMs merely mimic human speech without understanding truth, lacking the objective reality necessary for true intelligence [21] - Supervised learning is not natural; AI should learn from experiences rather than labeled data, similar to how animals learn in the wild [22] - Humanity is transitioning from a "copying era" to a "design era," where AI is designed with an understanding of its principles [23] Group 5: Insights from Sergey Levine - Robots do not need all-encompassing world models; they require a focused approach to complete tasks effectively [25] - High-level intelligence may involve "forgetting," allowing robots to react quickly without cognitive overload [26] - The failure of early autonomous driving was attributed to a lack of common sense, which modern robots are beginning to incorporate [27]