Scaling Law
Search documents
智谱发的「干活Agent」,不用邀请码
36氪· 2025-04-01 13:52
Core Viewpoint - The article discusses the advancements in AI technology, particularly focusing on the new AI Agent product "AutoGLM沉思" developed by 智谱, which aims to enhance the capabilities of AI in understanding and executing tasks based on natural language queries [3][4][17]. Group 1: Product Development and Features - "AutoGLM沉思" is an autonomous AI agent capable of exploring open-ended questions and executing operations based on the results, simulating human thought processes [4][5]. - The product can access various non-public APIs and has multi-modal understanding capabilities, allowing it to comprehend both text and images on web pages [5][6]. - A case study demonstrated that "沉思" could effectively manage a 小红书 account, gaining 5,000 followers in two weeks by summarizing popular topics from multiple sources [6][8]. Group 2: Comparison with Competitors - Compared to "Manus," which focuses on action and tool utilization, "沉思" emphasizes the thought process, showcasing its reasoning capabilities [9][10]. - "沉思" is currently a preview version that can perform tasks like research organization but is not yet fully operational for end-users [12][15]. - The new models released by 智谱, including GLM-Z1-Air, have significantly improved inference speed while reducing costs, indicating a competitive edge in the market [18]. Group 3: Strategic Insights and Future Directions - The CEO of 智谱 emphasized the importance of pre-training models, suggesting that future applications will revolve around model capabilities rather than just product interfaces [20]. - The company is exploring the concept of a "沉思大模型," which aims to enhance AI's real-time search, dynamic tool usage, and self-validation capabilities [17][20]. - The article highlights the need for AI agents to overcome current limitations in intelligence to avoid being blocked by third-party platforms, indicating ongoing challenges in the industry [25].
从DeepSeek R1的复现看深度思考模型的未来|ML-Summit 2025
AI科技大本营· 2025-03-31 06:55
备受瞩目的 2025 全球机器学习技术大会(ML Summit 2025)将于 4 月 18-19 日在上海虹桥西郊庄园丽笙大酒店召开。本次盛会由 CSDN & Boolan 联合主办,汇聚了超 50 位来自学术界和工业界顶尖专家,共同探讨智能体、联邦学习、多模态大模型等热门 AI 技术实践。 作为全球机器学习技术大会的老朋友,新浪微博首席科学家及 AI 研发部负责人张俊林将带来《从 DeepSeek R1 的复现看深度思考模型的未来》的精 彩分享。 张俊林作为「大模型技术拆解得最通透的实战派」,在 2024 年的机器学习技术大会上,他对 Gemini 多模态架构、OpenAI o1 技术的硬核拆解,让 开发者直呼"终于有人讲透技术本质"。 系统梳理技术脉络: 回顾 DeepSeek R1 开源后的各类复现研究,涵盖 SFT 阶段的轻量适配(如 S1)与 RL 阶段的创新实践。 深度解析训练范式: 重点剖析其核心的两阶段训练模式——如何通过冷启动微调结合多领域数据优化进行 SFT,以及如何运用 GRPO 强化学习 与全场景对齐实现模型"深度思考"能力的跃迁。 探讨关键技术问题: 尝试解答一系列备受关注的核心问 ...
对话2025最火具身智能团队:2个自动驾驶第一人带队,1.2亿美元天使融资震动江湖
量子位· 2025-03-26 10:29
Core Viewpoint - The article discusses the emergence of a new startup in the field of embodied intelligence, which has recently raised $120 million in angel funding, setting a record in China's embodied intelligence sector. The company, TARS, is led by a highly experienced team from the autonomous driving industry, aiming to create a reliable AI and robotics solution that integrates advanced technology into everyday life [2][3][7][12]. Group 1: Company Overview - TARS, founded by industry leaders Chen Yilun and Li Zhenyu, aims to build a trustworthy super embodied intelligence system that integrates AI into human production and life [12][18]. - The company has completed a record-breaking $120 million angel round financing, led by prominent venture capital firms, indicating strong market interest and confidence in its potential [3][7][8]. Group 2: Team and Expertise - The founding team is described as a "dream team" with extensive experience in the autonomous driving sector, which is crucial for the company's success in embodied intelligence [4][12]. - Key figures include Chen Yilun, a former CTO of Huawei's autonomous driving division, and Li Zhenyu, a senior vice president at Baidu, both of whom have significant achievements in their respective fields [11][16]. Group 3: Technology and Innovation - TARS is developing a core technology engine called AWE (AI World Engine), which is likened to a GPT model for embodied intelligence, focusing on human-centric data collection to enhance AI capabilities [15][37]. - The company emphasizes the importance of creating a complete data acquisition mechanism to ensure the reliability and effectiveness of its AI systems, distinguishing itself from competitors [39][45]. Group 4: Market Potential and Vision - The founders believe that embodied intelligence will be a key driver of global industrial upgrades in the next decade, positioning TARS to capitalize on this trend [12][51]. - The article highlights the growing interest and investment in embodied intelligence, suggesting it could be the next major technological wave following AI and autonomous driving [2][3].
大模型“神仙打架”,掀起复现潮、技术大升级后,我们需要关注什么? | 万有引力
AI科技大本营· 2025-03-25 01:45
以下文章来源于CSDN ,作者万有引力 CSDN . 成就一亿技术人 作者 | 万有引力 出品 | CSDN(ID:CSDNnews) 在过去短短的几周里,大模型赛道的信息密度飙升至前所未有的高度。DeepSeek 连续 五天开源 ,直接引发了一场复现热潮;阿里巴巴通义实验室、 腾讯相继推出面向视觉文档的 RAG 系统 ViDoRAG、新一代混元快思考模型 Turbo S ,加速了大模型的演进步伐;马斯克用 20 万张 GPU 训练出的 Grok 3 ,超越了许多业界标杆,再次验证了"大力出奇迹"的定律; Claude 3.7 Sonnet 迎来编码能力大升级,AI 编程的技术平权时代正在加速到来; DeepSeek 论文与 Kimi"撞车",越来越多公司开始布局稀疏注意力与线性注意力机制,这些技术正成为 Transformer 之后的关键探索方向;此外, Manus 模式的"虚拟机"概 念迅速走红,正在重塑大模型的运行方式... 在这场眼花缭乱的技术竞赛背后,真正值得我们关注的是什么?DeepSeek 的五连发 究竟意欲何为?在 545% 的成本利润率之下,其他大模型公司是 否也能找到盈利空间?面对行业变 ...
科技行业跟踪报告之五:英伟达GTC2025发布新一代GPU,推动全球AI基础设施建设
EBSCN· 2025-03-21 13:33
Investment Rating - Electronic Industry: Buy (Maintain) [6] - Communication Industry: Overweight (Maintain) [6] - Computer Industry: Buy (Maintain) [6] Core Insights - NVIDIA introduced the concept of Agentic AI, which represents a new reasoning paradigm that will continue to drive global data center construction. This evolution is categorized into three stages: Generative AI, Agentic AI, and Physical AI [12][13] - The global investment in data center construction is expected to reach $1 trillion by 2028, driven by the need for larger computational resources and data for training better models [2][17] - The Blackwell Ultra chip, designed for AI inference needs, will be supplied in the second half of 2025, with significant performance improvements over its predecessor [20][22] - NVIDIA's new AI inference service software, Dynamo, aims to maximize token yield in AI models and supports the development of AI agents [33][35] Summary by Sections 1. Agentic AI and Data Center Development - The introduction of Agentic AI is seen as a pivotal shift in AI technology, emphasizing autonomy and complex problem-solving capabilities [12][13] - The Scaling Law remains relevant, as it will expand to include inference and long-term reasoning, requiring substantial computational resources [14][17] 2. Blackwell Ultra Chip and Future Releases - The Blackwell Ultra chip will enhance AI performance significantly, with a 1.5 times improvement in AI capabilities compared to the previous generation [22] - The Vera Rubin series is expected to launch in 2026, featuring advanced architecture and enhanced memory capacity [22][23] 3. Quantum-x CPO Switch Launch - NVIDIA plans to release the 115.2T 800G Quantum-x CPO switch in the second half of 2025, which will offer substantial improvements in energy efficiency and network resilience [26][29] 4. Introduction of Dynamo and AI Frameworks - Dynamo will facilitate efficient AI inference by optimizing GPU resource utilization across different processing phases [33][35] - NVIDIA also introduced the AI-Q framework to enhance AI agents' reasoning capabilities and reduce development costs [37] 5. Investment Recommendations - The report suggests focusing on companies within the electronic communication and computer industries that are positioned to benefit from the advancements in AI and data center infrastructure [45][46] - Specific companies to watch include those involved in AI computing, robotics, and data platforms, highlighting a diverse range of investment opportunities [46][47]
DeepSeek重构算力基建长期价值的认知
Guotai Junan Securities· 2025-03-14 07:10
Investment Rating - The report rates the industry as "Buy" [1] Core Insights - The market has underestimated the amplifying effect of the DeepSeek ecosystem on computing power demand, with an expected near million PFLOPS of demand generated solely from its inference end [3] - Domestic AI chip manufacturers, particularly those like Huawei Ascend, are poised to benefit significantly from the reduction in entry barriers for large model training, expanding the overall market size [12] - The emergence of the DeepSeek ecosystem presents unprecedented opportunities for domestic AI chips, with Huawei Ascend's performance nearing international standards [12] Summary by Sections Investment Recommendations - DeepSeek's technological breakthroughs, while raising short-term concerns about high-end AI chip demand, have expanded the overall market size by lowering the entry barriers for large model training. Domestic chip manufacturers, especially Huawei Ascend, are expected to gain market share due to their cost-performance advantages in enterprise deployment [12] - Recommended stocks include Unisplendour, Inspur Information, and iFlytek, with beneficiaries including CloudWalk Technology, Topwise Information, Digital China, and Zhongke Shuguang [12] DeepSeek - DeepSeek-V3 has set a new economic benchmark for large language model training costs at $557.6 million, utilizing only 2.788 million GPU hours to complete full training, which has led to a reevaluation of AI computing cost [12] - The technology innovations from DeepSeek have not diminished the demand for high-performance AI chips but have instead expanded the market size by lowering entry barriers and generating massive inference demand [12] Training Innovations - DeepSeek V3 and R1 have significantly reduced large model training costs through innovations such as MLA mechanisms, FP8 mixed precision training, and DualPipe parallel frameworks [14] - The Multi-Token Prediction (MTP) mechanism in DeepSeek-V3 allows for more efficient data utilization and dense training signals, enhancing the model's long-term dependency capabilities [19] Inference Optimization - DeepSeek V3 employs a dual-stage inference architecture to balance service quality and throughput, optimizing the deployment costs for large-scale applications [35] - The R1 series utilizes model distillation techniques to achieve smaller model deployments, significantly lowering inference costs [41] Market Dynamics - The low-cost breakthroughs from DeepSeek have prompted a reassessment of AI development paths, with a notable market reaction reflected in Nvidia's stock price drop [42] - Despite the reduction in per-call costs, the rapid user growth of DeepSeek has led to a surge in overall computing demand, highlighting the ongoing need for high-performance computing infrastructure [44] Scaling Law and Future Trends - The report emphasizes that AI development continues to follow Scaling Law, with increasing model, data, and computing scales driving demand [52] - The trend towards multi-agent and multi-modal AI systems is expected to further increase computing power requirements, as these systems necessitate complex reasoning and real-time adjustments [59][63]
晚点播客丨MiniMax 闫俊杰聊大模型 2024:一个非共识判断引起的回声
晚点LatePost· 2025-01-22 13:56
"更好的模型可以导向更好的应用,但更好的应用和更多用户并不会导向更好的模型。" 文丨程曼祺 * 头图:Dota 2019 国际邀请赛决赛(TI9)中,OG 战队的 Ana 使用 IO(小精灵,图中球形发光体)的经典作战,OG 在 TI9 中夺冠。为什么用这个图?播客里有 答案。 ▲扫描上图中的二维码,可收听播客。《晚点聊 LateTalk》#99 期节目。欢迎在小宇宙、喜马拉雅、苹果 Podcast 等渠道关注、收听我们。 《晚点聊 LateTalk》是《晚点 LatePost》 推出的播客节目。"最一手的商业、科技访谈,最真实的从业者思考。" 上周四,我们发布图文访谈:《 晚点对话 MiniMax 闫俊杰:千万别套用移动互联网的逻辑来做 AI 》,这是这次访谈的音频版。 闫俊杰的一些 "非共识" 判断,引起不少讨论。 他认为,模型能力和用户规模并不是直接的飞轮关系:"更好的模型可以导向更好的应用,但更好的应用和更多用户并不会导向更好 的模型。" 而今天(1 月 22 日)字节跳动发布 Doubao-1.5-pro 模型的技术报告里则提到:"依托字节在推荐、搜索和广告领域的 AB Test 经 验,研发了基于 ...
她为何被雷军挖角
投资界· 2025-01-21 07:35
以下文章来源于南风窗 ,作者朱秋雨 南风窗 . AI小厂崛起。 作者 | 朱秋雨 来源 | 南风窗 (ID:shangyejingxiang) 2024年末,一个中国AI小厂,凭借过硬的技术,获得了全球铺天盖地的关注。 圣诞节过后,海外社交媒体以及技术论坛Github都在讨论一个最新发布的开源大模型, DeepSeek-V3。它被外国网友冠以名号——"来自东方的神秘力量"。 多个评测报告里,DeepSeek-V3在世界开源模型之中处在第一梯队,超过扎克伯格的 LLaMa 3.1。拿它比GPT-4o以及Claude 3.5两个最顶尖大模型也毫不逊色,甚至, 其在数学推理、代码生成和长文本处理等指标上,表现更强。 这还不是中国AI公司DeepSeek(中文名:深度求索)全部的"拿手好戏"。更让美国硅 谷等同行摸不着头脑的是,DeepSeek公布的53页技术报告显示,其训练顶尖大模型只 用了2048张H100的GPU集群,花费53天,共计耗费557.6万美元。有专业人员指 出,同等水平之下,世界AI大厂至少要用1.6万张以上的GPU,有的甚至需要10万张 GPU并行训练。 OpenAI早期成员安德烈·卡帕西感慨,D ...
AI正在诞生一个万亿级公司
投资界· 2024-12-25 08:24
人工智能"加减乘除"。 报道 I 投资界PEdaily 2 0 24年12月1 0 - 11日,由中共重庆市委金融委员会办公室指导,清科创业、投资界主 办,重庆渝富控股集团联合主办的"第二十四届中国股权投资年度大会"走进西部金融中 心重庆。作为股权投资行业晴雨表,本次大会将以"万象耕新"为主题,回顾行业风云, 重塑格局策略,探索价值发现,持续为中国股权投资行业注入力量。 本场《人工智能+-×÷》圆桌论坛,由光速光合合伙人 孙健 主持,对话嘉宾为: 真格基金 合伙人 刘元 啟赋资本 合伙人 宋昶 广州基金 首席投资官 易沙 合力投资 管理合伙人 张敏 以下为对话实录, 经投资界(ID:peda il y 2 0 1 2)编辑: 孙健 :很荣幸主持这场圆桌讨论,主题是人工智能加减乘除,为什么是加减乘除?我在 想,是不是可能半年前人工智能还处于(X),发展到今天大家面临着各种各样的疑问 和困惑,就演变成了一个加减乘除,不知道该选择哪一个符号。在开场前,还是请大家 先自我介绍。 张敏 :我来自合力投资,主要关注早期投资,2001年进入到天使投资,现在做了2 3年。 易沙 :广州基金是由广州市委、市政府为推进广州产业转 ...
晚点播客丨OpenAI o1 如何延续 Scaling Law,与硅基流动袁进辉聊 o1 新范式
晚点LatePost· 2024-09-20 15:22
"如果每天和开发者打交道,你不会感觉这个行业停滞或变冷。" 文丨程曼祺 贺乾明 扫描图中右下角二维码,可收听播客。* 这是《晚点聊 LateTalk 的第 80 期节目,欢迎在小宇宙、喜马拉雅、苹果 Podcast 等渠道关注、收听我们。 《晚点聊 LateTalk》是《晚点 LatePost》 推出的播客节目,在文字报道之外,用音频访谈形式捕捉商业世界变化的潮流和不变的逻辑,与这 其中的人和故事。 OpenAI 发布新模型 o1 后的第二天,我们邀请了硅基流动创始人袁进辉与我们分享了 o1 的技术意义,也讨论了今年 1 月至今,袁进辉观察 到的 AI 开发者社区变化。 o1 的一个重要变化就是增加了分配给推理(inference,即大模型的使用)阶段的算力,推理阶段计算(test-time compute)重要性提升。 而袁进辉今年初创立的硅基流动(SiliconFlow)就是一家做推理加速优化的 AI Infra(中间层软件)公司。他是一位连续创业者,曾在 2017 年创立一流科技(OneFlow),在 2023 年加入王慧文组建的大模型创业公司光年之外,成为联合创始人。(袁进辉的上两段创业故事,可 听 ...