Workflow
Scaling Law
icon
Search documents
博士宿舍激情脑暴,革新了Scaling Law?Qwen和浙大联手推出新定律,直接干掉95.5%推理内存!
AI前线· 2025-05-21 10:04
Core Viewpoint - Alibaba's research team, in collaboration with Zhejiang University, has proposed a new Scaling Law called Parallel Scaling Law (ParScale), which enhances the capabilities of large models during training and inference by increasing parallel computation without adding model parameters, resulting in higher inference efficiency [1][3][19]. Summary by Sections Introduction of ParScale - ParScale allows for the deployment of more powerful models in low-resource scenarios by reusing existing parameters to expand parallel computation, applicable to any model structure, optimization process, data, or task [1][19]. - The memory increase from ParScale is only 4.5% compared to parameter scaling, while the latency increase is 16.7% [1][19]. Comparison with Traditional Scaling Methods - Traditional scaling methods include parameter expansion and inference-time scaling, both of which have significant resource demands [3][4]. - ParScale introduces multiple parallel streams during training and inference, converting a single input into multiple inputs for forward propagation, which are then combined into a single output [5][10]. Implementation of ParScale - The implementation involves three steps: diversifying input transformations, parallel processing, and dynamic aggregation of outputs [13]. - A two-stage post-training strategy is employed to manage the increased training costs due to the number of parallel streams, significantly reducing overall training costs while maintaining performance gains [12][14]. Performance Metrics - As the number of parallel streams (P) increases, model performance improves across various benchmarks, particularly in tasks requiring strong reasoning abilities [15][16]. - For instance, with P increased to 8, the model showed a 4.3% improvement in coding tasks, a 7.3% improvement in math tasks, and a 10% improvement on the GSM8K benchmark [15]. Application and Future Prospects - ParScale is particularly suitable for edge devices like smartphones, cars, and robots, where memory resources are limited [17][19]. - The research team plans to explore ParScale's application in more model architectures and larger datasets, indicating its potential to complement existing methods like MoE architectures [19].
10万美元成本训练的小模型,在特定任务超越GPT-4o,延迟低99倍
3 6 Ke· 2025-05-14 09:45
Core Insights - Fastino has developed Task-Specific Language Models (TLMs) that perform comparably to large language models (LLMs) but at a significantly lower cost and with much faster inference speeds [3][8][9] - The company has raised nearly $25 million in funding, indicating strong investor interest in its innovative approach to AI model development [3][4] Company Overview - Fastino was co-founded by Ash Lewis and George Hurn-Maloney, both experienced entrepreneurs with a background in AI startups [4][6] - The company has assembled a strong technical team with members from Google DeepMind, Stanford University, Carnegie Mellon University, and Apple [6] Technology and Performance - TLMs are designed to be lightweight and high-precision, focusing on specific tasks rather than general-purpose capabilities [8][9] - Fastino's TLMs can achieve inference speeds that are 99 times faster than OpenAI's GPT-4o, with a latency of just 100ms compared to GPT-4o's 4000ms [8][9] - In benchmark tests, TLMs outperformed GPT-4o in various tasks, achieving an F1 score that is 17% higher [9][10] Market Positioning - Fastino targets developers and small to medium enterprises rather than consumer markets, offering subscription-based pricing that is more accessible [11][13] - The TLMs can be deployed on low-end hardware, allowing businesses to utilize advanced AI capabilities without the high costs associated with larger models [13][14] Competitive Landscape - The trend towards smaller, task-specific models is gaining traction, with other companies like Cohere and Mistral also offering competitive small models [14][15] - The advantages of small models include lower deployment costs, reduced latency, and the ability to meet specific use cases without the overhead of general-purpose models [14][15]
早融合 VS 晚融合,Natvie 多模态大模型的 Scaling Law 有所不同吗?
机器之心· 2025-05-10 13:10
本期通讯总计 21681 字,可免费试读至 6% 消耗 99 微信豆即可兑换完整本期解读(约合人民币 9.9 元) 机器之心PRO · 会员通讯 Week 19 --- 本周为您解读 ② 个值得细品的 AI & Robotics 业内要事 --- 1. 早融合 VS 晚融合,Natvie 多模态大模型的 Scaling Law 有所不同吗? 什么是Native多模态模型?相较目前流行的「晚融合」方案,「早融合」的Native多模态模型的训练过程有何不同?苹果公司 近期发布的「NNM」技术报告中,有哪些反直觉的新发现?近期业内有哪些获得较好表现的多模态模型?「早融合」是否正在 成为主流?... 2. Agent产品,快者为王?Anthropic 和 Databrick CEO 对话解读 Dario Amodei 为什么说「AI 的未来是 Agents」?数据的「Scaling Law」依然乐观?围绕 Agents 进行数据创新?MCP和 A2A范式下,企业怎样维护数据系统安全?Agents产品迭代的关键缺口如何突破?人类如何把握 AI 技术的双刃剑?... 本期完整版通讯含 2 项专题解读 + 29 项 AI ...
Agent产品,快者为王?Anthropic 和 Databrick CEO 对话解读
机器之心· 2025-05-10 06:07
Group 1 - The core viewpoint of the article emphasizes that the future of AI lies in the development of Agents, which can autonomously interact with data and tools, driving innovation across various sectors [6][8]. - Dario Amodei's article "Machines of Loving Grace" highlights that humanity has underestimated both the benefits and risks of AI, necessitating a focus on risk management for a positive future [7]. - The discussion indicates that while traditional companies and AI firms must collaborate for effective market implementation, the adaptation of lagging economic sectors to these innovations is crucial [7][8]. Group 2 - Data is deemed irreplaceable, with Dario Amodei asserting that it embodies the knowledge and wisdom accumulated by enterprises, essential for fine-tuning AI models [10]. - Ali Ghodsi emphasizes that proprietary data is central to building competitive barriers, particularly industry-specific data that is critical for training AI models [10]. - The conversation also touches on the importance of data governance and the need for tools like Unity Catalog to manage data risks effectively [8][9]. Group 3 - The article discusses the rapid iteration of AI applications, suggesting that breakthroughs in product development hinge on overcoming key gaps in Agent product iteration [4]. - Both Amodei and Ghodsi express optimism regarding the "Scaling Law," indicating that practical applications require optimization beyond pre-training, while also addressing issues of data depletion and cost [9]. - The integration of MCP protocols is highlighted as a means to enhance the use of external data resources in AI tools [8].
李建忠:大模型技术创新驱动的 AI 生态和应用演进
AI科技大本营· 2025-04-24 03:39
【导读】历经八年 AI 浪潮,从感知到生成,再到智能体时代,人工智能正以惊人速度演进。CSDN 高级副总裁、Boolan 首席技术专家李建忠,在 2025 全 球机器学习技术大会上,绘制了一幅宏大的 AI 发展蓝图,并创造性地将其与生物智能演化史进行对比,揭示了"语言"在智能跃迁中的核心地位。跟随李建 忠的思考,洞见 AI 的过去、现在与激动人心的未来。 作者 | 李建忠 出品丨AI 科技大本营(ID:rgznai100) 大家好!回想起我在 2017 年创办全球机器学习技术大会( ML-Summit ),在各位的支持下一起陪着 AI 一路走了八个年头,非常感慨。八年来,整个 人工智能领域也发生了波澜壮阔的变化。接下来我想和大家分享一下我对大模型最新发展的一些研究和思考。 我把 AI 的发展阶段和地球上从生物智能到人类智能的发展阶段做了一个对比,发现一些非常有意思的规律。大家首先来看 AI 发展的四个阶段。 第一阶段: 1940 年代开启人工智能的元年, 整个人工智能从 1940 年代图灵提出计算机理论模型和神经网络的初始构想,到 1956 年达特茅斯会议首 次提出人工智能,此后人工智能进入符号主义、行为主义 ...
深度|微软CTO最新访谈: 我不相信通用Agent,未来是成千上万Agent协作的时代,聊天界面只是过渡的交互模式
Z Finance· 2025-04-19 06:31
Core Insights - The conversation emphasizes the importance of sustainable value in the next generation of AI, highlighting the confusion and uncertainty that often accompany major technological shifts [3][4] - Kevin Scott argues that the current era is the best time for entrepreneurs, advocating for active exploration and product development rather than passive observation [5] - The discussion also touches on the balance of value creation between startups and established companies like Microsoft, suggesting that both can benefit from new AI capabilities [6][7] Group 1: AI Value and Product Development - Kevin Scott believes that while models are valuable, their worth is realized only when connected to user needs through products [6] - The conversation stresses that product quality is paramount, and that successful exploration requires rapid iteration and responsiveness to data and feedback [5][6] - The scaling law in AI is not seen as having a limit currently, with Scott asserting that AI capabilities will continue to expand [8] Group 2: Data and Efficiency - The importance of high-quality data is highlighted, with synthetic data becoming increasingly significant in model training [9][10] - There is a noted gap in the ability to evaluate the impact of specific data on model performance, indicating a need for better assessment tools [9][10] Group 3: Future of AI Agents - The future of AI agents is discussed, with expectations for improved memory and task execution capabilities, allowing them to handle more complex tasks autonomously [21][22] - The interaction model between humans and agents is expected to evolve, moving towards more asynchronous operations [22] Group 4: Industry Dynamics and Trends - The conversation reflects on the dual existence of open-source and closed-source solutions in AI, suggesting that both will coexist and serve different needs [15] - The role of engineers and product managers is expected to change, with a greater emphasis on specialization and collaboration with AI agents [18][19] Group 5: AI's Impact on Technical Debt - Kevin Scott expresses optimism that AI can help mitigate technical debt, transforming it from a zero-sum problem to a non-zero-sum opportunity [31] - The potential for AI to accelerate product development and reduce the burdens of technical debt is seen as a significant advantage [30][31]
OpenAI自曝GPT-4.5训练内幕:数据效率是关键,预训练仍然有用
Founder Park· 2025-04-14 11:34
智能产业新媒体!智东西专注报道人工智能主导的前沿技术发展,和技术应用带来的千行百业产业升级。聚焦智能变革,服务产业升级。 在 GPT-4.5 发布 1 个多月后,Sam Altman 与 GPT-4.5 的 3 位核心技术人员进行了一场 45 分钟的高信息量对谈,首次披露了这款模型 研发耗时严重超 期 、 计算集群频繁故障 、 提升路径难以预测 等诸多不为人知的细节。 对于今后的模型训练范式,乃至如何重新理解 Scaling Law、以及数据效果,都有不少启发。 参与本次对谈的 3 位 OpenAI 员工分别为 Alex Paino(负责 GPT-4.5 的预训练机器学习算法)、Amin Tootoonchian(OpenAI 首席系统架构师)与 Daniel Selsam(研究数据效率与算法)。 以下文章来源于智东西 ,作者陈骏达 陈家阳 智东西 . TLDR Founder Park 正在搭建开发者社群,邀请积极尝试、测试新模型、新技术的开发者、创业者们加入,请扫码详细填写你的产品/项目信息,通过审核后 工作人员会拉你入群~ 进群之后,你有机会得到: 01 GPT-4.5两年前已启动, 项目耗时远超预期 ...
智谱发的「干活Agent」,不用邀请码
36氪· 2025-04-01 13:52
Core Viewpoint - The article discusses the advancements in AI technology, particularly focusing on the new AI Agent product "AutoGLM沉思" developed by 智谱, which aims to enhance the capabilities of AI in understanding and executing tasks based on natural language queries [3][4][17]. Group 1: Product Development and Features - "AutoGLM沉思" is an autonomous AI agent capable of exploring open-ended questions and executing operations based on the results, simulating human thought processes [4][5]. - The product can access various non-public APIs and has multi-modal understanding capabilities, allowing it to comprehend both text and images on web pages [5][6]. - A case study demonstrated that "沉思" could effectively manage a 小红书 account, gaining 5,000 followers in two weeks by summarizing popular topics from multiple sources [6][8]. Group 2: Comparison with Competitors - Compared to "Manus," which focuses on action and tool utilization, "沉思" emphasizes the thought process, showcasing its reasoning capabilities [9][10]. - "沉思" is currently a preview version that can perform tasks like research organization but is not yet fully operational for end-users [12][15]. - The new models released by 智谱, including GLM-Z1-Air, have significantly improved inference speed while reducing costs, indicating a competitive edge in the market [18]. Group 3: Strategic Insights and Future Directions - The CEO of 智谱 emphasized the importance of pre-training models, suggesting that future applications will revolve around model capabilities rather than just product interfaces [20]. - The company is exploring the concept of a "沉思大模型," which aims to enhance AI's real-time search, dynamic tool usage, and self-validation capabilities [17][20]. - The article highlights the need for AI agents to overcome current limitations in intelligence to avoid being blocked by third-party platforms, indicating ongoing challenges in the industry [25].
从DeepSeek R1的复现看深度思考模型的未来|ML-Summit 2025
AI科技大本营· 2025-03-31 06:55
备受瞩目的 2025 全球机器学习技术大会(ML Summit 2025)将于 4 月 18-19 日在上海虹桥西郊庄园丽笙大酒店召开。本次盛会由 CSDN & Boolan 联合主办,汇聚了超 50 位来自学术界和工业界顶尖专家,共同探讨智能体、联邦学习、多模态大模型等热门 AI 技术实践。 作为全球机器学习技术大会的老朋友,新浪微博首席科学家及 AI 研发部负责人张俊林将带来《从 DeepSeek R1 的复现看深度思考模型的未来》的精 彩分享。 张俊林作为「大模型技术拆解得最通透的实战派」,在 2024 年的机器学习技术大会上,他对 Gemini 多模态架构、OpenAI o1 技术的硬核拆解,让 开发者直呼"终于有人讲透技术本质"。 系统梳理技术脉络: 回顾 DeepSeek R1 开源后的各类复现研究,涵盖 SFT 阶段的轻量适配(如 S1)与 RL 阶段的创新实践。 深度解析训练范式: 重点剖析其核心的两阶段训练模式——如何通过冷启动微调结合多领域数据优化进行 SFT,以及如何运用 GRPO 强化学习 与全场景对齐实现模型"深度思考"能力的跃迁。 探讨关键技术问题: 尝试解答一系列备受关注的核心问 ...
对话2025最火具身智能团队:2个自动驾驶第一人带队,1.2亿美元天使融资震动江湖
量子位· 2025-03-26 10:29
衡宇 李根 发自上海 量子位 | 公众号 QbitAI 可问题是这都已经2025年了……最早出发的具身智能创业者,在3年前的时间点已经下水。进展快速的具身智能公司,也已经开启场景验证和 落地。以及具身智能领域,也从不缺天才和大牛创业者。 还有什么样的创业团队,凭什么在此时此刻搅动如此风云? 一位知情人士说,核心原因是团队豪华,堪称 梦之队 ,而且还是有过硬科技完整落地经验的工程派。也有人拿NBA篮球类比, "库里和约基 奇联手组了队,联盟大结局" ——库里是三分外线第一人,约基奇则被视为最全能的内线中锋,而这家公司背后的核心人物也是 两位自动驾 驶领域的第一人 。 据说这两人联手创业的进展传出后,获得了这样的评价: 陈亦伦带队,牛了;李震宇坐镇,稳了。 他们在上海,组建战队,取名 它石智航 TARS ,竞逐具身智能的GPT时刻。 他们创业的消息,实际流传已久,但现如今随着创纪录的1.2亿美元天使融资曝光,再也藏不住了。 中国具身智能最壕天使轮融资 它石智航(TARS) 官宣的新进展是这样的: 完成天使轮1.2亿美元融资,开启具身智能创业新征程。本轮融资由蓝驰创投、启明创投联合领投,线性资本、恒旭资本、洪泰基 ...