scaling

Search documents
昇腾+鲲鹏双核暴击!华为打通MoE训练任督二脉再加速20%,内存省70%
雷峰网· 2025-06-04 09:31
令人惊喜的是,结果显示, MOE 训练在之前的基础上,吞吐又提升了 20% ,内存占用降低了 70% 。 这不仅是一次技术突破,更是引领 MoE 训练的风向标。 " Pangu Ultra MoE 的每一项突破,都体现了华为在AI底层技术 与工程化落地中的领先实力。 " 作者丨李希 最近,华为在 MoE 训练系统方面,给出了 MoE 训练算子和内存优化新方案:三大核心算子全面提速, 系统吞吐再提 20% , Selective R/S 实现内存节省 70% 。 在通往更强大的 AI 路上, MoE 已成为科技巨头另一个首选路径。 只要 Scaling Law 没有失效,大模型的参数规模依旧不断扩大,由此 AI 智能水平才能不断攀升。 凭借独特的架构设计, MoE 正以前所未有的参数规模,成为突破大规模模型训练的算力瓶颈的关键路径 之一。 然而,如何将 MoE 潜力真正转化为高效的训练实践,一直是业界探索的难题。 此前,华为曾通过 Adaptive Pipe&EDPB 框架,实现了集群级高效分布式计算,让通信和计算能完美并 行,提高训练集群效率。 本次,华为通过昇腾与鲲鹏算力的深度协同,进一步实现了训练算子计算 ...
全球“All in AI” 中国科技巨头生态“攻守”
2 1 Shi Ji Jing Ji Bao Dao· 2025-05-29 14:12
Core Viewpoint - The article discusses the competitive landscape of AI in China, highlighting the strategic moves of major tech companies as they prepare for an impending AI arms race by 2025, driven by the need for computational power and ecosystem integration [2][10]. Group 1: AI Development and Scaling Law - The emergence of AI technologies, particularly DeepSeek, is tied to the necessity of increasing computational power, as described by the Scaling Law, which states that AI development requires substantial computational resources [3][12]. - Despite initial skepticism regarding the adherence to Scaling Law, it has been observed that even advanced AI models like DeepSeek still require significant computational resources for training and operation [3][12]. Group 2: Historical Context and Cloud Computing - The evolution of cloud computing in China can be traced back to events like the success of "Double Eleven," which highlighted the need for robust computational systems to handle peak loads, leading to the development of Alibaba Cloud [4][5]. - Alibaba Cloud has grown to become the largest cloud service provider in China, serving 4 million customers and reaching 47 million small and medium-sized enterprises globally, with projected revenues of $6.513 billion in 2024 [7]. Group 3: Competitive Strategies of Major Players - Major players like Huawei and Tencent are adopting distinct strategies in the AI space, with Huawei focusing on a fully autonomous technology stack and Tencent leveraging its extensive social ecosystem to enhance its AI capabilities [9][10]. - Tencent's recent capital expenditures for AI projects have shown a decline compared to previous quarters, indicating a cautious approach amidst rising competition and evolving market dynamics [12]. Group 4: Market Dynamics and Challenges - The rise of open-source models like DeepSeek has created a competitive environment where traditional monetization strategies for AI services face challenges, complicating the capital expenditure return cycle for major companies [13]. - The article suggests that the future of AI in China may hinge on who can effectively control the ecosystem, as companies navigate the complexities of free service models and the need for sustainable revenue generation [13].
清华天才杨植麟的“理想国”,为何败给梁文锋?
凤凰网财经· 2025-05-28 12:51
以下文章来源于白鲸实验室 ,作者八尺 白鲸实验室 . AI时代的科技与商业文明观察 01 天才的标签之外杨植麟还是个资深文青。90后一代或多或少都曾迷恋过村上春树,1992年出生的杨植麟也不例外。在村上春树的一本小说中,杨植麟对一个 程序员深夜写代码这件事印象深刻,并充满憧憬,这为他未来进入AI领域埋下伏笔。 高中和大学时期他热爱摇滚,最喜欢的乐队是平克弗洛伊德。在清华读书期间,他创立了摇滚乐队Splay,曾晋级清华大学校园歌手大赛原创决赛。清华向 来有音乐传统,除了走出过高晓松和水木年华,杨植麟那位大名鼎鼎的学弟姚顺雨(任职于OpenAI),本科时创立了清华大学说唱社。 玩摇滚和说唱属于理科生的叛逆和浪漫。90后一代人的迷茫在于,这个时代留给他们的红利并不多,音乐恰好能宣泄这种愤懑的情绪。杨植麟的乐队创作过 一首歌,讲述有关"做了一个创业成功一夜暴富的白日梦"。对追求理想和获得金钱总是摇摆不定,正是青春期普遍的状态,渴望一夜暴富或许是抵挡理想主 义破灭的有效手段。 从时间坐标上来看,其实90后赶上过移动互联网红利期的尾巴。戴威是只比杨植麟大一岁的清华校友,2015年戴威的ofo共享单车正式上线,并在全球首 ...
杨植麟,一个90后理想主义者的悬浮
Hu Xiu· 2025-05-28 06:01
天才的标签之外,杨植麟还是个资深文青。90后一代或多或少都曾迷恋过村上春树,1992年出生的杨植 麟也不例外。在村上春树的一本小说中,杨植麟对一个程序员深夜写代码这件事印象深刻,并充满憧 憬,这为他未来进入AI领域埋下了伏笔。 高中和大学时期,他热爱摇滚,最喜欢的乐队是平克弗洛伊德。在清华读书期间,他创立了摇滚乐队 Splay,曾晋级清华大学校园歌手大赛原创决赛。清华向来有音乐传统,除了走出过高晓松和水木年 华,杨植麟那位大名鼎鼎的学弟姚顺雨(任职于OpenAI),本科时还曾创立了清华大学说唱社。 玩摇滚和说唱属于理科生的叛逆和浪漫。90后一代人的迷茫在于,这个时代留给他们的红利并不多,音 乐恰好能宣泄这种愤懑的情绪。杨植麟的乐队创作过一首歌,讲述了一个关于"做了一个创业成功一夜 暴富的白日梦"的故事。他们对追求理想和获得金钱总是摇摆不定,这正是青春期普遍的状态,渴望一 夜暴富或许是抵挡理想主义破灭的有效手段。 从时间坐标上来看,其实90后赶上过移动互联网红利期的尾巴。戴威是只比杨植麟大一岁的清华校友, 2015年,戴威的ofo共享单车正式上线,并在全球首创"无桩单车共享"模式,成为当之无愧的创业明 星。of ...
Now, Scaling What?
机器之心· 2025-05-24 14:12
Group 1 - The core viewpoint of the article revolves around the transition in the AI industry towards exploring "What to Scale" as the traditional Scaling Law faces diminishing returns, prompting researchers to seek new paradigms for enhancing model capabilities [3][4]. - The article highlights the emergence of new scaling targets, including "Self-Play RL + LLM," "Post-Training Scaling Law," and "Test-Time Training," as researchers aim to improve model performance beyond pre-training [4][6]. - A significant focus is placed on Test-Time Scaling (TTS), which involves increasing computational resources during the inference phase to enhance model output quality, marking a shift from pre-training to inference optimization [6][7]. Group 2 - The article discusses various scaling strategies, including Parallel Scaling, Sequential Scaling, Hybrid Scaling, and Internal Scaling, each with distinct methodologies aimed at improving model performance during testing [9][10]. - It emphasizes the equal importance of fine-tuning and inference in the post-training phase, suggesting that both aspects are crucial for adapting models to specific applications and enhancing their output quality [11].
2024年中国人工智能产业研究报告
艾瑞咨询· 2025-05-23 09:42
Core Viewpoint - The artificial intelligence (AI) industry is recognized as a key development direction by the government, with significant policies aimed at promoting innovation and enhancing regional economic competitiveness. The rise of open-source models like DeepSeek is accelerating the domestic AI ecosystem's openness and competitiveness, marking a significant event in China's AI industry development [1][4][25]. Summary by Sections Research Background - The AI industry is positioned as a core engine for the new technological revolution and industrial transformation, with the government emphasizing its strategic importance [1]. Macro Environment - In 2024, the national focus on AI development is evident, with local governments promoting research innovation and infrastructure. Despite a slowdown in GDP growth, AI technology shows vast potential for efficiency improvement and industrial upgrading, supported by government initiatives [4]. Industry Dynamics - The AI market size in China is projected to reach 269.7 billion yuan in 2024, with a growth rate of 26.2%, slightly below expectations due to high costs and unmet client needs in real business scenarios [6]. - The demand for computing power is shifting structurally, with increased utilization expected as open-source models drive application growth [6]. - The ecosystem of AI tools is improving, with advancements in distributed AI frameworks and LLMOps platforms facilitating model training and deployment [6]. - Commercialization is primarily project-based for enterprises, while consumer products often adopt a "free + subscription" model [6]. - Many companies are actively pursuing overseas markets to mitigate domestic competition [6]. Development Trends - AI Agents are evolving product applications from simple Q&A to complex task completion, with embodied intelligence becoming a strategic focus for future AI competition [8]. - The open-source movement led by DeepSeek is promoting equitable access to AI technology, enhancing its application in both industrial and consumer sectors [8]. Policy Environment - The government has integrated AI into national development strategies, with various cities launching initiatives to foster local AI industries [9]. Capital Environment - Investment in the AI sector is increasing, particularly in language and multimodal applications, with a notable rise in equity investment [12]. Technology Environment - The Transformer architecture is the foundation for current large model developments, with ongoing exploration in efficiency optimization and new attention mechanisms [16][18]. Market Size - The AI industry in China is expected to exceed 1 trillion yuan by 2029, with a compound annual growth rate of 32.1% from 2025 to 2029 [24][25]. Application Layer Insights - The application layer is seeing a competitive landscape where pricing and user engagement strategies are critical, with many companies adopting aggressive pricing tactics [34]. - B-end applications are primarily driven by state-owned enterprises, focusing on sectors like government, education, and energy [37]. C-end Product Ecosystem - C-end AI products are rapidly developing, but many still face challenges in user retention and monetization [39]. AI Agent Development - AI Agents are bridging the gap between model capabilities and application needs, with a growing ecosystem of diverse vendors driving innovation [45][76]. AI Hardware - AI capabilities are increasingly integrated into consumer hardware, with significant advancements in mobile devices and educational tools [47]. Voice Modality - Voice recognition and generation capabilities are improving, with a focus on end-to-end model architectures enhancing user interaction [50]. Visual Modality - The Transformer architecture continues to dominate visual model development, with ongoing advancements in generative models [56]. Language Modality - Language models are primarily driven by large enterprises, with a focus on enhancing user experience and functionality [66]. AI Product Commercialization - Current AI product monetization strategies are primarily project-based and subscription-based, with potential for new models emerging [69]. International Expansion - Many companies are looking to expand into international markets, with a focus on AI image/video and social applications [71][73].
机器人“最强大脑”竞赛白热化:特斯拉、Figure押注空间智能
2 1 Shi Ji Jing Ji Bao Dao· 2025-05-22 12:54
Group 1 - Tesla and Figure Robotics are making significant advancements in robotics, showcasing their capabilities in household chores and factory operations respectively [1][2] - Tesla's robots utilize a unified neural network model for training, learning from real human videos rather than traditional VR motion capture [1][4] - The rapid progress in robotics is attracting investment interest, with several companies securing substantial funding and forming strategic partnerships [2][3] Group 2 - The complexity of robotic operations in three-dimensional space is highlighted, with Tesla leveraging its experience in autonomous driving to enhance robotic models [4][5] - Current challenges in the industry include the high cost and time required for collecting real-world data, which is essential for training robots effectively [5][6] - The deployment of humanoid robots in factories is seen as a critical step towards commercialization, with several companies already integrating robots into their production lines [6][7] Group 3 - The cost of humanoid robots remains high, with prices ranging from 500,000 to 1,000,000 yuan per unit, which poses a barrier to widespread adoption [6] - Companies are exploring high-value industrial scenarios and rapid adaptation to overcome productivity bottlenecks and achieve scalability in humanoid robotics [6][7] - The integration of self-assembling robots could create a significant industrial market, as demonstrated by Figure's plans for large-scale production of humanoid robots [6][7]
博士宿舍激情脑暴,革新了Scaling Law?Qwen和浙大联手推出新定律,直接干掉95.5%推理内存!
AI前线· 2025-05-21 10:04
整理 | 华卫 提升大语言模型(LLM)的智能水平,通常有两条主流的 Scaling Law 路线。一是扩展参数,用更多 模型参数来更细致地学习,这种方法非常吃显存;二是扩展推理思考的时间,增大思维链长度,这种 方法非常吃时间且依赖于训练数据、训练策略(RL),只适用于部分场景。 | Method | Inference Time | Inference Space | Training Cost | Specialized Strategy | | --- | --- | --- | --- | --- | | Dense Scaling | Moderate | 20 High | Pre-training only | (= No | | MoE Scaling | Low | 60 High | Pre-training only | 69 Load balancing | | Inference-Time Scaling | 6. High | (= Moderate | Post-training | 0 RL / reward data | | Parallel Scaling | (=) Mo ...
超过霉霉,她拿下全球最年轻女富豪
创业家· 2025-05-16 09:55
以下文章来源于投中网 ,作者张雪 投中网 . 投中网是领先的创新经济信息服务平台,拥有立体化传播矩阵,为创新经济人群提供深入、独到的智识 和洞见,在私募股权投资行业和创新商业领域拥有权威影响力。官网:www.chinaventure.com.cn 持股公司估值突破1800亿。 作者:张雪 来源:投中网 35岁的泰勒·斯威夫特失去了"全球最年轻白手起家女亿万富翁"头衔,取而代之的是一位华裔 科技创业者,年仅30岁的Lucy Guo。 | 姓名 | 年龄 | 身家 | 国籍 | 财富来源 | | --- | --- | --- | --- | --- | | Lucy Guo | 30 | 12.5 | 美国 | 人工智能 | | 泰勒·斯威夫特(Taylor Swift) | 35 | 16 | 美国 | 音乐 | | 丹妮拉·阿莫迪(Daniela Amodei) | 37 | 12 | 美国 | 人工智能 | | 梅兰妮·珀金斯(Melanie Perkins) | 37 | 57 | 澳大利亚 | 软件 | | 蕾哈娜(Rihanna) | 37 | 14 | 巴巴多斯 | 化妆品、音乐 | | 卢依雯 ...
Tencent says it has enough high-end chips to train AI for 'generations' even if the US cuts it off
Business Insider· 2025-05-15 04:30
The Chinese tech giant Tencent said it has a "pretty strong stockpile of chips" to tide it through America's chip sale restrictions. The company's president, Martin Lau, was speaking to investors during an earnings call on Wednesday when he was asked how Tencent would deal with US chip restrictions.Lau said "it's a very dynamic situation" that Tencent is managing, and it's trying to "figure out the right solution" to make sure its AI strategy "can still be executed." Lau told investors that Tencent can ...