混合专家架构(MoE)
Search documents
21社论丨规模效应将强化中国AI产业市场优势
21世纪经济报道· 2026-03-04 01:22
一年前,DeepSeek R1的横空出世震撼全球AI界;一年后,字节跳动推出的Seedance 2.0视频 生成模型,凭借多镜头叙事与音画同步能力,被业界誉为"导演级AI",标志着AIGC正从"玩 具"阶段迈向真正可用的"工具"时代。 电力基础设施、硬件制造与实体AI应用。中国企业在成本、效率与场景落地方面的综合优势, 正在重塑全球AI产业的竞争规则。 正是基于这一判断,高盛近日发布报告认为:当前市场对中国AI的定价远未反映其潜在的经济 效益,若沿用美国的叙事逻辑审视中国AI,将错失其在电力、硬件等优势环节的结构性机会, 中国AI市场存在巨大的价值重估空间。 与此同时,瑞银等外资机构也纷纷将中国AI列为长线布局主线。在美国主要科技公司估值已处 于相对高位的背景下,若以更全球化的视角审视AI投资,中国在电力、硬件、模型等领域的企 业,将成为全球AI算力需求爆发的核心受益者。 SFC 出品丨21财经客户端 21世纪经济报道 中国不仅在AI应用层面持续领先,基础模型能力也在快速跃升。全球最大的AI模型API聚合平 台OpenRouter数据显示,2026年2月,中国模型的调用量首次超越美国模型;在全球调用量排 名前 ...
超DeepEP两倍!无问芯穹FUSCO以「空中变阵」突破MoE通信瓶颈,专为Agent爆发设计
机器之心· 2025-12-31 09:31
Core Viewpoint - The article discusses the increasing adoption of the Mixture-of-Experts (MoE) architecture in large models like ChatGPT and Gemini, highlighting the challenges in communication and data rearrangement that arise from this architecture, particularly in high concurrency and long context scenarios [1][2]. Group 1: MoE Architecture and Challenges - MoE models introduce significant global distributed data exchange due to their sparse structure and expert parallelism, leading to performance bottlenecks in existing communication libraries like DeepEP [2]. - The communication and data rearrangement overhead increases with the scale of expert parallelism, making distributed data shuffling a critical performance bottleneck in training and inference [11][14]. Group 2: Introduction of FUSCO - FUSCO, developed in collaboration with several universities, aims to optimize communication for MoE models by integrating communication processes with data layout transformations, eliminating redundant data rearrangement [3][4]. - Experimental results show that FUSCO can improve communication performance by up to 3.84 times compared to NCCL and 2.01 times compared to DeepEP, especially as the number of concurrent requests and text length increases [4][44]. Group 3: FUSCO Design and Functionality - FUSCO's design allows for data rearrangement to occur during the communication process, maximizing GPU and network bandwidth utilization while minimizing additional memory operations [16][27]. - The communication interface of FUSCO is built around logical segments, allowing precise data access and placement without intermediate buffering or post-processing rearrangement [21][23]. Group 4: Performance Evaluation - In tests involving 64 GPUs, FUSCO demonstrated significant improvements in communication efficiency across various traffic configurations, effectively reducing communication overhead and enhancing load balancing [44][45]. - FUSCO's end-to-end performance improvements in training and inference tasks were notable, with enhancements of up to 1.39 times compared to NCCL and 1.19 times compared to DeepEP [47][48].