Workflow
预训练
icon
Search documents
OpenAI大溃败,GPT-5「换皮」GPT-4o,两年半预训练0突破
3 6 Ke· 2025-12-01 02:12
Core Insights - OpenAI is facing significant challenges with its pre-training processes, particularly for the upcoming GPT-5 model, which reportedly still relies on the foundation of GPT-4o [1][3][12] - The company has not achieved substantial progress in scaling its pre-training efforts since the release of GPT-4o, leading to concerns about the performance of GPT-5 [7][12][20] - Google's TPU technology is emerging as a strong competitor, potentially undermining NVIDIA's dominance in AI hardware, which OpenAI has heavily relied upon [5][26] Pre-training Challenges - OpenAI's pre-training for GPT-5 has been described as a failure, with the internal project "Orion" being downgraded to GPT-4.5 due to unmet expectations [11][12] - The pre-training phase is critical for developing generative AI models, and OpenAI's struggles in this area have raised questions about the capabilities of GPT-5 compared to its predecessors [29][39] - Despite advancements in algorithms reducing the physical computation required for training, OpenAI's Orion project exceeded the typical training duration of 1-2 months, taking over 3 months [14][36] Performance Comparisons - The performance improvements of GPT-5 have been perceived as modest, with industry reactions indicating it is more of an enhancement of GPT-4o rather than a revolutionary upgrade [20][35] - Benchmark comparisons show that Google's Gemini 3 has outperformed GPT-5 in several areas, highlighting the competitive landscape in AI model performance [31] Strategic Shifts - OpenAI is reportedly shifting focus towards a new model, codenamed "Shallotpeat," aimed at addressing the pre-training issues encountered with previous models [46][50] - The company acknowledges the need for specialized models rather than a single "super model," reflecting a broader industry consensus on the diversification of AI applications [54][60] - OpenAI's internal discussions indicate a recognition of Google's advancements in pre-training, marking a significant shift in the competitive dynamics of the AI landscape [27][29]
Ilya辟谣Scaling Law终结论
AI前线· 2025-11-30 05:33
Core Insights - The era of relying solely on scaling resources to achieve breakthroughs in AI capabilities may be over, as stated by Ilya Sutskever, former chief scientist of OpenAI [2] - Current AI technologies can still produce significant economic and social impacts, even without further breakthroughs [5] - The consensus among experts is that achieving Artificial General Intelligence (AGI) may require more breakthroughs, particularly in continuous learning and sample efficiency, likely within the next 20 years [5] Group 1 - Ilya Sutskever emphasized that the belief in "bigger is better" for AI development is diminishing, indicating a shift back to a research-driven era [16][42] - The current models exhibit a "jaggedness" in performance, excelling in benchmarks but struggling with real-world tasks, highlighting a gap in generalization capabilities [16][20] - The focus on scaling has led to a situation where the number of companies exceeds the number of novel ideas, suggesting a need for innovative thinking in AI research [60] Group 2 - The discussion on the importance of emotional intelligence in humans was compared to the value function in AI, suggesting that emotions play a crucial role in decision-making processes [31][39] - Sutskever pointed out that the evolution of human capabilities in areas like vision and motor skills provides a strong prior knowledge that current AI lacks [49] - The potential for rapid economic growth through the deployment of advanced AI systems was highlighted, with the caveat that regulatory mechanisms could influence this growth [82]
AI大神伊利亚宣告 Scaling时代终结!断言AGI的概念被误导
混沌学园· 2025-11-28 12:35
Group 1 - The era of AI scaling has ended, and the focus is shifting back to research, as merely increasing computational power is no longer sufficient for breakthroughs [2][3][15] - A significant bottleneck in AI development is its generalization ability, which is currently inferior to that of humans [3][22] - Emotions serve as a "value function" for humans, providing immediate feedback for decision-making, a capability that AI currently lacks [3][6][10] Group 2 - The current AI models are becoming homogenized due to pre-training, and the path to differentiation lies in reinforcement learning [4][17] - SSI, the company co-founded by Ilya Sutskever, is focused solely on groundbreaking research rather than competing in computational power [3][31] - The concept of superintelligence is defined as an intelligence that can learn to do everything, emphasizing a growth mindset [3][46] Group 3 - To better govern AI, it is essential to gradually deploy and publicly demonstrate its capabilities and risks [4][50] - The industry should aim to create AI that cares for all sentient beings, which is seen as a more fundamental and simpler goal than focusing solely on humans [4][51] - The transition from the scaling era to a research-focused approach will require exploring new paradigms and methodologies [18][20]
离开OpenAI后,苏茨克维1.5小时长谈:AGI最快5年实现
3 6 Ke· 2025-11-27 05:43
Core Insights - The interview discusses the strategic vision of Safe Superintelligence (SSI) and the challenges in AI model training, particularly the gap between model performance in evaluations and real-world applications [1][3][5]. Group 1: AI Development and Economic Impact - SSI's CEO predicts that human-level AGI will be achieved within 5 to 20 years [5]. - Current AI investments, such as allocating 1% of GDP to AI, are seen as significant yet underappreciated by society [3][5]. - The economic impact of AI is expected to become more pronounced as AI technology permeates various sectors [3][5]. Group 2: Model Performance and Training Challenges - There is a "jagged" performance gap where models excel in evaluations but often make basic errors in practical applications [5][6]. - The reliance on large datasets and computational power for training has reached its limits, indicating a need for new approaches [5][6]. - The training environments may inadvertently optimize for evaluation metrics rather than real-world applicability, leading to poor generalization [6][21]. Group 3: Research and Development Focus - SSI is prioritizing research over immediate commercialization, aiming for a direct path to superintelligence [5][27]. - The company believes that fostering competition among AI models can help break the "homogeneity" of current models [5][27]. - The shift from a "scaling" era back to a "research" era is anticipated, emphasizing the need for innovative ideas rather than just scaling existing models [17][28]. Group 4: Value Function and Learning Mechanisms - The concept of a value function is likened to human emotions, suggesting it could guide AI learning more effectively [11][12]. - The importance of internal feedback mechanisms in human learning is highlighted, which could inform better AI training methodologies [25][39]. - SSI's approach may involve deploying AI systems that learn from real-world interactions, enhancing their adaptability and effectiveness [35][37]. Group 5: Future of AI and Societal Implications - The potential for rapid economic growth driven by advanced AI systems is acknowledged, with varying impacts based on regulatory environments [38][39]. - SSI's vision includes developing AI that cares for sentient beings, which may lead to more robust and empathetic AI systems [41][42]. - The company is aware of the challenges in aligning AI with human values and the importance of demonstrating AI's capabilities to the public [40][41].
llya最新判断:Scaling Laws逼近极限,AI暴力美学终结
3 6 Ke· 2025-11-26 08:46
Core Insights - Ilya Sutskever, co-founder of OpenAI and a key figure in deep learning, has shifted focus from scaling models to research-driven approaches in AI development [1][2][3] - The industry is moving away from "scale-driven" methods back to "research-driven" strategies, emphasizing the importance of asking the right questions and developing new methodologies [2][3] - Sutskever argues that while AI companies may experience stagnation, they can still generate significant revenue despite reduced innovation [2][3] - The potential for narrow AI models to excel in specific domains suggests that breakthroughs may come from improved learning methods rather than merely increasing model size [3][4] - The emergence of powerful AI could lead to transformative societal changes, including increased productivity and shifts in political and governance structures [3][4] - Sutskever emphasizes the importance of aesthetic principles in research, advocating for simplicity and elegance in AI design [4] Industry Trends - The scaling laws that dominated AI development are nearing their limits, prompting a return to foundational research and exploration [2][28] - The current phase of AI development is characterized by a shift from pre-training to reinforcement learning, which is more resource-intensive [29][30] - The distinction between effective resource utilization and mere computational waste is becoming increasingly blurred in AI research [30][31] - The scale of computational resources available today is substantial, but the focus should be on how effectively these resources are utilized for meaningful research [42][44] Company Insights - Safe Superintelligence (SSI) has raised $3 billion, positioning itself to focus on foundational research without the pressures of market competition [45][46] - SSI's approach to AI development may differ from other companies that prioritize immediate market applications, suggesting a long-term vision for advanced AI [45][46] - The company believes that the true value lies not in the sheer amount of computational power but in the strategic application of that power to drive research [43][44]
Ilya重磅发声:Scaling时代终结,自曝不再感受AGI
3 6 Ke· 2025-11-26 06:54
Core Insights - The era of Scaling has ended, and the industry is transitioning into a Research Era [1][3][14] - Current AI models, despite their improvements, lack the generalization capabilities necessary for achieving Artificial General Intelligence (AGI) [3][5][8] - The disconnect between AI model performance in benchmarks and real-world applications is a significant issue [5][6][8] Summary by Sections Transition from Scaling to Research Era - Ilya Sutskever emphasizes that the AI community is moving from a focus on scaling models to a renewed emphasis on research and innovation [1][3][14] - The previous Scaling Era, characterized by increasing data, parameters, and computational power, has reached its limits, necessitating a shift in approach [12][14][15] Limitations of Current AI Models - Despite advancements, current models exhibit poor generalization abilities compared to human intelligence, failing to develop true problem-solving intuition [3][5][8] - Reinforcement Learning (RL) training often leads to over-optimization for specific benchmarks, detracting from the model's overall performance [3][5][6] Importance of Human-Like Learning - Ilya argues that human learning is driven by an intrinsic "value function," which AI currently lacks, leading to less effective decision-making [10][11][12] - The need for AI to incorporate human-like judgment and intuition is highlighted as essential for future advancements [15][18] Future of AI and AGI - Predictions suggest that Superintelligent AI (ASI) could emerge within 5 to 20 years, but its development must be approached cautiously [19][51] - The concept of AGI is redefined, emphasizing the importance of continuous learning rather than a static state of intelligence [28][30][51] Role of Research and Innovation - The industry is expected to see a resurgence of smaller, innovative projects that can lead to significant breakthroughs, moving away from the trend of developing larger models [16][18] - Ilya suggests that the next major paradigm shift may come from seemingly modest experiments rather than grand scaling efforts [18][19] Collaboration and Safety in AI Development - As AI capabilities grow, collaboration among companies and regulatory bodies will become increasingly important to ensure safety and ethical considerations [43][44] - The need for a robustly aligned AI that cares for sentient life is emphasized as a preferable direction for future AI development [48][49]
The Information:承认谷歌超越!奥特曼内部信曝光:OpenAI领先优势缩小,预警“艰难时刻”到来
美股IPO· 2025-11-21 11:42
Core Insights - OpenAI's CEO Sam Altman acknowledged that the company's technological lead is diminishing due to significant advancements made by Google in the AI sector, which may create temporary economic headwinds for OpenAI [1][3] - Despite the challenges, Altman emphasized the importance of focusing on ambitious technological bets, even if it means OpenAI may temporarily lag behind in the current environment [1][11] Competitive Landscape - Google has made unexpected breakthroughs in AI pre-training, a critical phase in developing large language models, which has surprised many AI researchers [5] - OpenAI's competitors, particularly Anthropic, are reportedly on track to surpass OpenAI in revenue generated from AI sales to developers and enterprises [4][9] - Although ChatGPT remains significantly ahead of Google's Gemini chatbot in usage and revenue, the gap is narrowing [9] Financial Performance - OpenAI, valued at $500 billion and having received over $60 billion in investments, is facing unprecedented competitive pressure, raising concerns among investors about its future cash consumption [3][10] - In contrast, Google, valued at $3.5 trillion, generated over $70 billion in free cash flow in the past four quarters, showcasing its financial strength [9] Future Directions - OpenAI is focusing on long-term ambitious projects, including advancements in AI-generated data for training new AI and "post-training" techniques to improve model responses [11] - Altman expressed confidence in the company's ability to maintain its performance despite short-term competitive pressures, highlighting the need for the research teams to concentrate on achieving superintelligence [11]
OpenAI元老Karpathy 泼了盆冷水:智能体离“能干活”,还差十年
3 6 Ke· 2025-10-21 12:42
Group 1 - Andrej Karpathy emphasizes that the maturity of AI agents will take another ten years, stating that current agents like Claude and Codex are not yet capable of being employed for tasks [2][4][5] - He critiques the current state of AI learning, arguing that reinforcement learning is inadequate and that true learning should resemble human cognitive processes, which involve reflection and growth rather than mere trial and error [11][12][22] - Karpathy suggests that future breakthroughs in AI will require a shift from knowledge accumulation to self-growth capabilities and a reconstruction of cognitive structures [4][5][22] Group 2 - The current limitations of large language models (LLMs) in coding tasks are highlighted, with Karpathy noting that they struggle with structured and nuanced engineering design [6][7][9] - He categorizes human interaction with code into three types, emphasizing that LLMs are not yet capable of functioning as true collaborators in software development [7][9][10] - Karpathy believes that while LLMs can assist in certain coding tasks, they are not yet capable of writing or improving their own code effectively [9][10][11] Group 3 - Karpathy discusses the importance of a reflective mechanism in AI learning, suggesting that models should learn to review and reflect on their processes rather than solely focusing on outcomes [18][19][20] - He introduces the concept of "cognitive core," advocating for models to retain essential thinking and planning abilities while discarding unnecessary knowledge [32][36] - The potential for a smaller, more efficient model with only a billion parameters is proposed, arguing that high-quality data can lead to effective cognitive capabilities without the need for massive models [34][36] Group 4 - Karpathy asserts that AGI (Artificial General Intelligence) will gradually integrate into the economy rather than causing a sudden disruption, focusing on digital knowledge work as its initial application area [38][39][40] - He predicts that the future of work will involve a collaborative structure where agents perform 80% of tasks under human supervision for the remaining 20% [40][41] - The deployment of AGI will be a gradual process, starting with structured tasks like programming and customer service before expanding to more complex roles [48][49][50] Group 5 - The challenges of achieving fully autonomous driving are discussed, with Karpathy stating that it is a high-stakes task that cannot afford errors, unlike other AI applications [59][60] - He emphasizes that the successful implementation of autonomous driving requires not just technological advancements but also a supportive societal framework [61][62] - The transition to widespread autonomous driving will be a slow and incremental process, beginning with specific use cases and gradually expanding [63]
喝点VC|YC对谈Anthropic预训练负责人:预训练团队也要考虑推理问题,如何平衡预训练和后训练仍在早期探索阶段
Z Potentials· 2025-10-16 03:03
Core Insights - The article discusses the evolution of pre-training in AI, emphasizing its critical role in enhancing model performance through scaling laws and effective data utilization [5][8][9] - Nick Joseph, head of pre-training at Anthropic, shares insights on the challenges and strategies in AI model development, particularly focusing on computational resources and alignment with human goals [2][3][4] Pre-training Fundamentals - Pre-training is centered around minimizing the loss function, which is the primary objective in AI model training [5] - The concept of "scaling laws" indicates that increasing computational power, data volume, or model parameters leads to predictable improvements in model performance [9][26] Historical Context and Evolution - Joseph's background includes significant roles at Vicarious and OpenAI, where he contributed to AI safety and model scaling [2][3][7] - The transition from theoretical discussions on AI safety to practical applications in model training reflects the industry's maturation [6][7] Technical Challenges and Infrastructure - The article highlights the engineering challenges faced in distributed training, including optimizing hardware utilization and managing complex systems [12][18][28] - Early infrastructure at Anthropic was limited but evolved to support large-scale model training, leveraging cloud services for computational needs [16][17] Data Utilization and Quality - The availability of high-quality data remains a concern, with ongoing debates about data saturation and the potential for overfitting on AI-generated content [35][36][44] - Joseph emphasizes the importance of balancing data quality and quantity, noting that while data is abundant, its utility for training models is critical [35][37] Future Directions and Paradigm Shifts - The conversation touches on the potential for paradigm shifts in AI, particularly the integration of reinforcement learning and the need for innovative approaches to achieve general intelligence [62][63] - Joseph expresses concern over the emergence of difficult-to-diagnose bugs in complex systems, which could hinder progress in AI development [63][66] Collaboration and Team Dynamics - The collaborative nature of teams at Anthropic is highlighted, with a focus on integrating diverse expertise to tackle engineering challenges [67][68] - The article suggests that practical engineering skills are increasingly valued over purely theoretical knowledge in the AI field [68][69] Implications for Startups and Innovation - Opportunities for startups are identified in areas that can leverage advancements in AI models, particularly in practical applications that enhance user experience [76] - The need for solutions to improve chip reliability and team management is noted as a potential area for entrepreneurial ventures [77]
硬核「吵」了30分钟:这场大模型圆桌,把AI行业的分歧说透了
机器之心· 2025-07-28 04:24
Core Viewpoint - The article discusses a heated debate among industry leaders at the WAIC 2025 forum regarding the evolution of large model technologies, focusing on training paradigms, model architectures, and data sources, highlighting a significant shift from pre-training to reinforcement learning as a dominant approach in AI development [2][10][68]. Group 1: Training Paradigms - The forum highlighted a paradigm shift in AI from a pre-training dominant model to one that emphasizes reinforcement learning, marking a significant evolution in AI technology [10][19]. - OpenAI's transition from pre-training to reinforcement learning is seen as a critical development, with experts suggesting that the pre-training era is nearing its end [19][20]. - The balance between pre-training and reinforcement learning is a key topic, with experts discussing the importance of pre-training in establishing a strong foundation for reinforcement learning [25][26]. Group 2: Model Architectures - The dominance of the Transformer architecture in AI has been evident since 2017, but its limitations are becoming apparent as model parameters increase and context windows expand [31][32]. - There are two main exploration paths in model architecture: optimizing existing Transformer architectures and developing entirely new paradigms, such as Mamba and RetNet, which aim to improve efficiency and performance [33][34]. - The future of model architecture may involve a return to RNN structures as the industry shifts towards agent-based applications that require models to interact autonomously with their environments [38]. Group 3: Data Sources - The article discusses the looming challenge of high-quality data scarcity, predicting that by 2028, existing data reserves may be fully utilized, potentially stalling the development of large models [41][42]. - Synthetic data is being explored as a solution to data scarcity, with companies like Anthropic and OpenAI utilizing model-generated data to supplement training [43][44]. - Concerns about the reliability of synthetic data are raised, emphasizing the need for validation mechanisms to ensure the quality of training data [45][50]. Group 4: Open Source vs. Closed Source - The ongoing debate between open-source and closed-source models is highlighted, with open-source models like DeepSeek gaining traction and challenging the dominance of closed-source models [60][61]. - Open-source initiatives are seen as a way to promote resource allocation efficiency and drive industry evolution, even if they do not always produce the highest-performing models [63][64]. - The future may see a hybrid model combining open-source and closed-source approaches, addressing challenges such as model fragmentation and misuse [66][67].