Scaling Law

Search documents
谁说Scaling Law到头了?新研究:每一步的微小提升会带来指数级增长
机器之心· 2025-09-16 04:01
Core Viewpoint - The article discusses the ongoing debate regarding the diminishing returns of scaling models in AI, particularly in the context of large language models (LLMs). It presents a new perspective that, despite slower improvements in single-step accuracy, these incremental gains can lead to exponential growth in task completion length, which may hold greater economic value in real-world applications [1][3]. Group 1: Scaling Law and Economic Value - The scaling law indicates that while there may be diminishing returns in metrics like test loss, the real-world value of LLMs often comes from their ability to complete longer tasks. Larger models can compound small improvements in single-step accuracy, resulting in exponential increases in task length [3][6]. - The paper titled "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs" argues that the economic value of an AI agent is derived from the length of tasks it can complete, rather than short task benchmarks that may suggest stagnation in progress [5][19]. Group 2: Long-Horizon Execution Challenges - Long-term task execution has historically been a significant weakness for deep learning models. The paper highlights that while LLMs have improved in complex reasoning tasks, they still struggle with executing longer tasks reliably [6][11]. - The authors propose that failures in long-term execution are often misattributed to reasoning or planning deficiencies, when in fact, execution remains a critical and under-researched challenge [7][22]. Group 3: Self-Conditioning Effect - The study identifies a self-conditioning effect where the error rate in long tasks increases with each step, leading to a compounding effect of mistakes. This phenomenon contrasts with human performance, where practice typically leads to improvement [9][30]. - The authors found that larger models do not necessarily mitigate the self-conditioning effect, which can lead to a decline in performance over extended tasks [29][32]. Group 4: Impact of Thinking Models - Recent thinking models have shown the ability to correct for self-conditioning limitations, allowing for significantly longer task execution in single rounds. For instance, the GPT-5 thinking version can execute over 1000 steps, far surpassing competitors [10][36]. - The research emphasizes the importance of reasoning before action, as models that utilize thinking chains can perform better in executing longer tasks compared to those that do not [36][37]. Group 5: Experimental Insights - The experiments conducted reveal that increasing model size significantly enhances the number of rounds a model can successfully execute, demonstrating a clear scaling trend [27][28]. - The findings suggest that while larger models can improve task execution, they still face challenges due to self-conditioning, which remains a critical area for future research [29][37].
院士张宏江:Agent将替代企业流程,也会改变未来的人类组织构成
Xin Lang Ke Ji· 2025-09-11 02:34
专题:2025 Inclusion·外滩大会 新浪科技讯 9月11日上午消息,今日外滩大会现场,源码资本投资合伙人,美国国家工程院外籍院士张 宏江表示,DeepSeek R1出现之后,跟当时世界上最好的推理模型之间的差距,成本上只有几十分之 一,性能却非常接近。说明其实在资源这件事情上,当成本降低之后,它的需求会更大幅度成长。 他提到,以ChatGPT发布为标志,大模型两年多时间,今年三月份,ChatGPT的日活跃已经接近搜索引 擎的30%,说明大模型已经成为大家日常。还能看到的是,无论是OpenAI的ChatGPT还是其他,各家公 司使用大模型也在加速。 AI曾经是我们的助理,但是这个助理的时间很短,很快将会变成我们的伙伴,他表示,AI会有自己的 规划和行动,这是人和机器、人和模型的新的关系。他总结,Agent将替代企业流程,也会改变未来的 人类组织构成和就业。(罗宁) 责任编辑:江钰涵 张宏江表示,模型性能快速提高,使用成本快速降低。而这件事会伴随大模型的发展持续发生。大模型 的生态又在推动很多产业发生Scaling Law,并带动整个经济的发展。 张宏江提到agent的规划能力指数性成长,并出现摩尔定律 ...
国内外AI大厂重押,初创梭哈,谁能凭「记忆」成为下一个「DeepSeek」?
3 6 Ke· 2025-09-07 09:07
谁率先让模型拥有「记忆」,谁就掌握主动权。 「记忆」会是引爆新一轮 AI 浪潮的最后一块拼图吗? 如果时间往前推半年或四五个月,业界对于这一问题可能都是疑惑、不解:彼时 DeepSeek 将大模型推理能力推至高潮引起的余波仍在蔓延,Manus 在全 球范围内开启通用 AI Agent 新叙事,人们正沉浸在技术和应用双面开花带来的热闹、狂欢中……「记忆」,有什么好说的? 然而时至今日,推理已然成为各大模型标配,「百 Agent 混战」的背后,「通用 Agent」一席仍旧空缺。技术演进曲线的放缓和爆发式应用到来的「遥遥 无期」,开始让业界意识到,下一轮 AI 智能提升的关键在于,让 AI 能够像人类一样持续学习积累经验、适应新任务而不遗忘旧知识,同时实现长上下 文的高效理解。 换句话说,就是让大模型拥有像人类一样的「记忆」能力。或许有人会问,当前大模型似乎依靠长文本、外部数据库已经有「记忆」? 是,也不是。如果按照业界呼唤的「类人记忆」这一范畴来看,现在我们所讨论的「记忆」,是指大模型能够具备人类对记忆的组织、检索、应用方式, 是一种相较于当前大模型「短期记忆」的「长期记忆」或「终身记忆」。 其实,从国内外大模型 ...
国内外AI大厂重押,初创梭哈,谁能凭「记忆」成为下一个「DeepSeek」?
机器之心· 2025-09-07 05:12
Core Viewpoint - The article discusses the emerging importance of "memory" in AI models, suggesting that the ability to possess human-like memory will be a key factor in the next wave of AI advancements [2][6][35]. Group 1: Importance of Memory in AI - The concept of "memory" is evolving from short-term to long-term or lifelong memory, allowing AI to learn continuously and adapt to new tasks without forgetting previous knowledge [3][7]. - Recent developments in AI memory capabilities have been highlighted by major players like Anthropic, Google, ByteDance, and OpenAI, all of which have introduced memory features in their AI systems [4][6][35]. - The demand for memory capabilities is driven by both technical and application needs, as AI models are increasingly expected to function as long-term partners rather than just tools [20][21][23]. Group 2: Current Trends and Developments - Various AI companies are exploring different approaches to implement memory, including parameterized memory, context memory, and external databases [26][28][30]. - The industry is witnessing a surge in interest and investment in memory-related research, with many companies racing to develop and integrate these capabilities into their products [6][35]. - The competition among AI firms is intensifying, with the potential for breakthroughs in memory capabilities to redefine the market landscape, similar to past pivotal moments in AI development [35][36]. Group 3: Future Outlook - The timeline for achieving widespread and effective memory capabilities in AI is estimated to be one to two years for basic functionalities, while addressing governance and privacy issues may take three to five years [36][37]. - The future of AI memory capabilities remains uncertain, with various players in the industry vying for dominance, indicating that any company could emerge as a leader in this space [38].
实测阿里万亿参数大模型:开源路线跑通了吗?
Tai Mei Ti A P P· 2025-09-06 11:32
Core Insights - Alibaba has launched its largest model to date, Qwen3-Max-Preview, with over 1 trillion parameters, surpassing Claude in programming capabilities, demonstrating the effectiveness of Scaling Law [1][4][17] - The "model + cloud" strategy has created the shortest path from technology development to commercialization, which is a key factor in Qwen's success as a latecomer [1][19] - The core challenge of Alibaba's open-source model lies in balancing openness with profitability, requiring continuous technological breakthroughs and proof of commercial viability [1][20] Model Performance - Qwen3-Max-Preview has outperformed competitors in various benchmark tests, including SuperGPQA, AIME2025, LiveCodeBench V6, Arena-Hard V2, and LiveBench [2] - In programming capabilities, Qwen3-Max-Preview has achieved significant improvements, surprising many users with its performance [4][15] Development Strategy - Alibaba's approach to model development has been characterized by rapid open-sourcing of multiple model versions, from 7 billion to 1 trillion parameters, fostering a strong developer community [16][17] - The company has made substantial investments in computing infrastructure and AI engineering, which have been crucial for training large models like Qwen3-Max-Preview [17][18] Cloud Integration - Alibaba Cloud plays a vital role in supporting Qwen's development by providing a stable and efficient computing infrastructure, which reduces the engineering burden on development teams [18] - The MaaS strategy allows Qwen to penetrate various industries quickly, enabling businesses to utilize Qwen's API without starting from scratch [18][19] Challenges Ahead - The open-source model presents both opportunities and challenges, as it may hinder the ability to maintain a significant technological edge over competitors [20] - Retaining top AI talent is critical for Alibaba, as the departure of key personnel could impact team morale and project continuity [21][22] Conclusion - Overall, Alibaba's Qwen is a leading force in the global AI model landscape, leveraging a clear strategy of open-source and self-research, supported by Alibaba Cloud's ecosystem [22] - The release of the trillion-parameter model highlights the company's commitment to Scaling Law, but the sustainability of its business model and talent retention will be crucial for future success [22]
他们在1993年就提出了Scaling Law
量子位· 2025-09-02 06:17
Core Viewpoint - The article highlights that the concept of Scaling Law was proposed 32 years ago by Bell Labs, not by recent AI advancements, emphasizing the historical significance of this research in machine learning [1][6]. Group 1: Historical Context - The paper titled "Learning Curves: Asymptotic Values and Rate of Convergence" introduced a predictive method for training errors and testing errors converging to the same asymptotic error value as training size increases, following a power-law form [4][6]. - The authors of the 1993 paper included notable figures such as Vladimir Vapnik and Corinna Cortes, who contributed significantly to the field of machine learning [6][25]. Group 2: Methodology and Findings - The research aimed to save computational resources when training classifiers by predicting their performance on larger datasets based on smaller training sets [8][10]. - The study found that as the training set size increases, both training and testing errors converge to a common asymptotic value, denoted as 'a', which typically falls between 0.5 and 1 [10][16]. - The proposed method allows for the estimation of classifier performance on larger datasets without complete training, thus conserving computational resources [10][14]. Group 3: Implications and Applications - The findings indicated that the predictive model was highly accurate for linear classifiers, demonstrating its potential to optimize resource allocation in training models [15][24]. - The research also revealed that the more difficult the task, the higher the asymptotic error and the slower the convergence rate, indicating a relationship between task complexity and learning efficiency [22].
深度|Anthropic CEO:AI技术潜力巨大,但无序扩张才是风险所在,我将引导其走向正轨
Z Potentials· 2025-08-28 03:51
Core Insights - The article discusses the rapid growth and potential of Anthropic, a leading AI company focused on developing safe and reliable AI systems with human welfare at its core. The company has achieved a recurring annual revenue exceeding $4 billion, making it one of the fastest-growing enterprises in history [12][24]. Group 1: Company Structure and Trust - Anthropic was founded by seven co-founders, which is often viewed skeptically by outsiders. However, the long-standing trust and familiarity among the founders have allowed the company to maintain cohesion and core values during rapid expansion [11][10]. - The unique dynamic of sibling co-founders, Dario and Daniela Amodei, enhances the company's strategic execution and operational management, allowing them to focus on their strengths [9][10]. Group 2: AI Applications and Market Potential - The fastest-growing application of AI is in programming, driven by the close relationship between developers and AI model creators, leading to rapid adoption [10][12]. - AI's potential extends beyond programming, with applications in customer service, biology, and pharmaceuticals, showcasing its versatility across various sectors [13][14]. Group 3: Business Model and Growth Expectations - Anthropic positions itself as a platform company, focusing on broad enterprise services rather than solely vertical-specific products. This approach allows for better understanding of user needs and market demands [15][16]. - The company has experienced exponential growth, with revenue projections that have consistently exceeded initial expectations, indicating a strong market demand for AI solutions [24][25]. Group 4: Investment and Financial Dynamics - The financial model of AI companies involves significant upfront investment in model training, with expectations of high returns over time. This cyclical investment pattern is common in venture capital, where initial losses are expected before profitability is achieved [34][35]. - The current capital expenditures may obscure the underlying profitability of individual models, which can be profitable when analyzed independently [43][44]. Group 5: Talent and Competitive Advantage - The competition for talent in the AI industry is intense, but Anthropic maintains a high employee retention rate due to its strong mission and commitment to its values, which helps in retaining skilled personnel [51][53]. - The company's approach to knowledge protection involves complex engineering capabilities and a culture that balances openness with necessary information security measures [48][49]. Group 6: Future of AI and Market Structure - The future market structure for AI is expected to consist of a few dominant players capable of building cutting-edge models, with the potential for new entrants targeting specific use cases [33]. - The article suggests that AI's growth trajectory may continue to extend, with the possibility of AI companies becoming some of the largest enterprises globally [25][24].
OpenAI史上最大失误:放走这位MIT学霸,美国AI「三朝元老」,现实韦小宝
3 6 Ke· 2025-08-21 00:39
Group 1 - The core argument of the article emphasizes that the scale of AI infrastructure development is unprecedented, surpassing both the Apollo and Manhattan projects [1][7] - The investment in AGI computing power is experiencing explosive growth, with an annual increase of up to three times [2] - Tom Brown, co-founder of Anthropic, is highlighted as a key figure in the AI field, having transitioned from a self-taught background to a leader in the development of general artificial intelligence [3][4] Group 2 - Anthropic's Claude has become the preferred choice for developers globally, marking a significant achievement in AI infrastructure [7] - The article details Tom Brown's journey from entrepreneurship to AI research, including his experiences at OpenAI and the founding of Anthropic [9][10] - The scaling law's impact on AI development is discussed, noting that increased computational power leads to significant advancements in intelligence [31][32] Group 3 - The article outlines the competitive landscape, where Anthropic's Claude is gaining market share, particularly in programming applications, with preferences shifting towards Claude over competitors like ChatGPT [37][40] - The success of Claude Code is attributed to its unexpected emergence as a superior product, driven by a user-centered approach in its development [41][42] - Tom Brown's advice for young engineers emphasizes the importance of pursuing meaningful projects over traditional career paths, advocating for risk-taking and intrinsic motivation [46][49]
GPT-5暴写“屎山代码”,14个Prompt,看穿GPT-1到GPT-5七年智商进化史
3 6 Ke· 2025-08-19 08:56
Group 1 - The core viewpoint of the articles is that GPT-5 has been released but has received criticism for not meeting expectations compared to its predecessor, GPT-4, despite the advancements in AI capabilities over the years [1][3][5]. - A comparison of performance metrics between GPT-4 and GPT-5 shows that the Scaling Law has not hit a wall, indicating ongoing improvements in AI models [3][5]. - The evolution of the GPT family from GPT-1 to GPT-5 over seven years highlights significant advancements in AI capabilities, with various prompts demonstrating the models' growing sophistication [5][7][8]. Group 2 - The articles provide examples of how each version of GPT has improved in generating creative content, such as poetry, with GPT-5 producing more coherent and human-like responses compared to earlier versions [19][20][40]. - In terms of technical tasks, GPT-5 has shown a marked improvement in writing Python code, moving from nonsensical outputs in earlier versions to producing complex yet humorous code in GPT-5 [53][54]. - The ability of GPT-5 to explain complex concepts, such as integration by parts in mathematics, has also improved significantly, making it more effective as a teaching tool compared to its predecessors [57][64][69]. Group 3 - The articles discuss how GPT-5 can now provide structured and detailed plans for various tasks, such as building a running habit, showcasing its capability to act as a personal coach or advisor [125][126][127]. - The transition from GPT-1 to GPT-5 reflects a shift from generating random or irrelevant responses to providing logical, structured, and contextually relevant answers to user queries [70][75][90]. - GPT-5's responses are characterized by a more professional tone and comprehensive information, indicating its advancement in handling complex inquiries compared to earlier models [75][90].
李建忠:关于AI时代人机交互和智能体生态的研究和思考
AI科技大本营· 2025-08-18 09:50
Core Insights - The article discusses the transformative impact of large models on the AI industry, emphasizing the shift from isolated applications to a more integrated human-machine interaction model, termed "accompanying interaction" [1][5][60]. Group 1: Paradigm Shifts in AI - The transition from training models to reasoning models has significantly enhanced AI's capabilities, particularly through reinforcement learning, which allows AI to generate synthetic data and innovate beyond human knowledge [9][11][13]. - The introduction of "Agentic Models" signifies a shift where AI evolves from merely providing suggestions to actively performing tasks for users [16][18]. Group 2: Application Development Transformation - "Vibe Coding" has emerged as a new programming paradigm, enabling non-professionals to create software using natural language, which contrasts with traditional programming methods [19][22]. - The concept of "Malleable Software" is introduced, suggesting that future software will allow users to customize and personalize applications extensively, leading to a more democratized software development landscape [24][26]. Group 3: Human-Machine Interaction Evolution - The future of human-machine interaction is predicted to be dominated by natural language interfaces, moving away from traditional graphical user interfaces (GUIs) [36][41]. - The article posits that the interaction paradigm will evolve to allow AI agents to seamlessly integrate various services, eliminating the need for users to switch between isolated applications [45][48]. Group 4: Intelligent Agent Ecosystem - The development of intelligent agents is characterized by enhanced capabilities in planning, tool usage, collaboration, memory, and action, which collectively redefine the internet from an "information network" to an "action network" [66][68]. - The introduction of protocols like MCP (Model Context Protocol) and A2A (Agent to Agent) facilitates improved interaction between agents and traditional software, enhancing the overall ecosystem [70].