Workflow
雷峰网
icon
Search documents
生于昇腾,快人一步:盘古Pro MoE全链路优化推理系统揭秘
雷峰网· 2025-06-06 09:26
Core Viewpoint - Huawei's Pangu Pro MoE 72B model significantly enhances inference efficiency through system-level optimizations and innovative parallel processing strategies, establishing a benchmark in the MoE inference landscape [2][25]. Group 1: Model and Performance Enhancements - The Pangu Pro MoE model reduces computational overhead and ranks first domestically in the SuperCLUE benchmark for models with over 100 billion parameters [2]. - The inference performance of the Pangu Pro MoE model is improved by 6-8 times, achieving a throughput of 321 tokens/s on the Ascend 300I Duo and up to 1528 tokens/s on the Ascend 800I A2 [2][26]. Group 2: Optimization Strategies - The Hierarchical & Hybrid Parallelism (H2P) strategy enhances efficiency by allowing specialized communication within modules, avoiding the inefficiencies of traditional parallel processing [4][5]. - The TopoComm optimization reduces static overhead and improves data transmission efficiency, achieving a 21% increase in effective bandwidth and a 39% reduction in AllGather communication time [6][12]. - The DuoStream strategy integrates computation and communication, allowing simultaneous execution of tasks, which significantly boosts overall efficiency [8][10]. Group 3: Operator Fusion - Huawei has developed two specialized fused operators, MulAttention and SwiftGMM, to optimize resource access and computation scheduling, leading to substantial performance improvements in inference tasks [13][14]. - The MulAttention operator accelerates attention computation by 4.5 times, while the SwiftGMM operator reduces decoding latency by 48.7% [15][18]. Group 4: Algorithmic Innovations - The PreMoE algorithm dynamically prunes experts in the MoE model, enhancing throughput by over 10% while maintaining accuracy [22]. - The TrimR and SpecReason algorithms optimize the reasoning process, reducing unnecessary computation and improving throughput by 14% and 30%, respectively [23][21]. Group 5: Overall System Performance - The Ascend 300I Duo platform demonstrates exceptional performance with low latency and high throughput, achieving 321 tokens/s under optimal conditions, making it a cost-effective solution for various inference applications [29][30]. - The comprehensive optimization of the Pangu inference system establishes a robust foundation for large-scale deployment and efficient implementation of general large models [31].
65%央企AI创新首选,百度智能云如何让智能「涌现」?
雷峰网· 2025-06-06 09:26
Core Insights - The speed and quality of deploying large models are becoming critical competitive factors for companies in the wave of intelligence transformation [2][3] - The overall penetration rate of AI large models is still below 1%, but over half of the companies that have deployed them report significant business value [2] - There exists a cognitive gap and action gap between companies investing in technology and those viewing it as an "industry bubble," reflecting the challenges in transitioning from pilot projects to widespread adoption [2][3] Group 1: Challenges in Large Model Deployment - Companies face dual obstacles in their digital transformation: a lack of technical capabilities and the "barrel effect" caused by single capability shortcomings [2][3] - A large group invested 30 million in developing a corporate large model but ultimately abandoned the project due to difficulties in technical implementation, data privacy risks, and unclear business models [2] Group 2: Importance of Full-Stack Capabilities - Successful deployment of large models requires deep collaboration with industry experts who possess full-stack technical capabilities [3][5] - Baidu Smart Cloud is leading in the number of large model projects, industry coverage, and projects won by state-owned enterprises, positioning itself as an industry expert in large model deployment [3] Group 3: Infrastructure and Performance - Full-stack infrastructure is essential for the deployment of large models, addressing multiple barriers from model availability to business effectiveness [5][9] - Baidu Smart Cloud's Kunlun P800 chip supports efficient model training, significantly reducing costs and enhancing performance [8][9] Group 4: Innovations in Resource Utilization - The Baidu "百舸" platform has improved resource utilization by 50%, enhancing the performance of Kunlun chips and ensuring high stability in large model training [9][10] - The platform supports a mixed cloud approach, optimizing resource allocation and achieving over 95% effective training time for 30,000-card clusters [9][10] Group 5: Industry-Specific Large Models - Baidu has launched the "千帆慧金" financial large model, which is tailored for the financial sector, demonstrating superior performance compared to general models [14][15] - The model supports various financial applications, showcasing deep industry knowledge and reasoning capabilities [15][16] Group 6: Cost-Effectiveness and Accessibility - The pricing of Baidu's large models is significantly lower than competitors, making advanced AI technology more accessible to enterprises [16] - The 千帆 platform has facilitated the development of over 1 million enterprise-level AI applications, enhancing the deployment of intelligent agents across various industries [16][18] Group 7: Future Directions and Strategic Goals - Baidu aims to deepen its integration into industry scenarios, enhancing the development of intelligent agents that can coordinate across organizations [19][30] - The company is committed to continuous investment in advanced AI infrastructure to accelerate the industrialization of large models and unlock more value from various scenarios [31][32]
3倍薪资挖人!曝京东「偷袭」飞猪携程去哪儿;李斌:水军黑蔚来每月花3-5千万,大V:黑比亚迪得2亿;零跑汽车高管为业务不熟道歉
雷峰网· 2025-06-06 00:38
Group 1 - JD.com is aggressively expanding into the hotel and flight booking sector, offering 3 times the salary to recruit talent from competitors like Fliggy, Ctrip, and Qunar [4] - The gross profit margins of domestic new energy vehicle companies show significant competition, with Seres leading at 27.62% and Xiaomi following at 23.2% [6][7] - The merger between Changan and Dongfeng has been paused, with Changan's automotive business becoming an independent central enterprise [8] Group 2 - Morgan Stanley reports that Tesla possesses "military DNA" and has the potential to become a defense technology giant, with the urban air mobility market projected to reach $1 trillion by 2040 [20][21] - Qualcomm is preparing for a potential split with Apple, indicating that it no longer relies on Apple's business for future growth [22] - OpenAI's founder's dismissal has inspired a film adaptation, highlighting the dramatic events surrounding the company's leadership changes [25][26] Group 3 - Xiaopeng Motors and Huawei have jointly launched the "Chasing Light" AR head-up display system, which will first be featured in the upcoming Xiaopeng G7 model [17] - BYD has apologized for delays in the delivery of its Fangchengbao Ti3 model due to production capacity issues [15] - Alibaba's senior executive Mei Fengfeng is rumored to be returning to his original business department, although no official announcement has been made [11]
RL后训练步入超节点时代!华为黑科技榨干算力,一张卡干俩活
雷峰网· 2025-06-05 09:17
Core Viewpoint - Reinforcement Learning (RL) post-training has become a crucial path for breaking through the performance ceiling of large language models (LLMs), with Huawei introducing two key technologies to enhance efficiency and resource utilization in this process [2][3][56]. Group 1: RL Post-Training Challenges - RL post-training currently consumes 20% of the total computational power in the training process, projected to rise to 50%, significantly impacting model performance and costs [3]. - Traditional RL post-training suffers from low resource utilization due to the alternating execution of training and inference tasks, leading to substantial computational waste [11][13]. - The complexity of task scheduling in large-scale clusters has increased due to the popularity of Mixture of Experts (MoE) models, making efficient collaboration challenging [15][16]. Group 2: Huawei's Innovations - Huawei's "RL Fusion" technology allows a single card to handle both training and inference tasks simultaneously, effectively doubling resource utilization and throughput [5][18]. - The "StaleSync" mechanism enables a quasi-asynchronous approach, allowing different RL tasks to execute in parallel within a defined "staleness threshold," improving horizontal scaling efficiency to over 90% [29][32]. - The combination of RL Fusion and StaleSync technologies significantly enhances the efficiency of RL post-training, achieving a throughput increase of 1.5 times [52][56]. Group 3: Performance Metrics - The implementation of RL Fusion can lead to a throughput increase from 14.0k tokens/sec to 35.0k tokens/sec when combined with StaleSync, representing a 150% improvement compared to baseline configurations [54]. - In a multi-node setup, StaleSync allows for linear scaling efficiency, with throughput increasing from 35k tokens/sec to 127k tokens/sec as the number of nodes increases from 1 to 4, achieving a linearity of 91% [55].
长安、东风重组暂停,前者汽车业务成独立央企
雷峰网· 2025-06-05 07:43
长安汽车公告显示,中国兵器装备集团有限公司(下称"兵装集团")将实施分立,汽车业务将从兵装体系中剥离,设立为一家独立中央企业,由国务院国资委 直接履行出资人职责。 作为兵装集团旗下整车核心资产,重庆长安汽车股份有限公司的间接控股股东也随之变更为该新设央企,实际控制人未变化。 两家车企动态,或将对中国汽车行业带来新变量。 作者丨 田哲 编辑丨林觉民 在整车央企战略重组的背景下,长安与东风作为老牌国资车企,近期交出了两份路径截然不同的答卷。 6月5日,长安汽车与东风汽车相继发布公告,对各自控股股东重组事项的进展进行说明。雷峰网注意到,这两份公告的关键词,分别是"分立"与"暂不涉 及"。 长安汽车公告 东风汽车公告 国资委对央企整车资产的整合早有部署。今年3月,在中国电动汽车百人会论坛上,国务院国资委副主任苟坪公开表示,将对整车央企进行战略性重组,以提 高产业集中度、整合优势资源、打造具备全球竞争力的一流汽车集团。 传出重组的长安、东风汽车近年都在加码新能源汽车业务。 相比之下,东风汽车披露的消息更显保守。东风公司虽同样在2月公告中提及"正筹划与其他央企的重组事项",但至6月确认"暂不涉及相关资产和业务重 组",现 ...
国补的最大受益者,小米还能接着赢吗?
雷峰网· 2025-06-05 07:43
16000家小米之家并不是终点。 作者丨 雪银 编辑丨 相辉 今年618还没落幕,国补却提前下线了。 近期多地国补陆续按下"暂停键"。以率先调整的江苏省为例,自6月1日起,其线上国补取消,线下国补以 每日限额的方式发放,消费者需提前在相关平台领补贴资格券,且补贴仅适用于家电及3C产品。同样取消 线上国补的还有河北、甘肃及广东的大部分地区,山西省实行每日限量领券补贴,重庆国补已于6月4日零 点停止,成为全国第一个国补资金使用完毕下线的地区。 今年年初,本轮国补政策正式落地,个人消费者在购买手机、平板、智能手表(手环)等三类数码产品 时,若产品单价不超过6000元,则可享受产品销售价格15%的补贴,每位消费者每类产品限购1件,补贴 上限为500元。 Counterpoint数据显示,1月20日至1月26日,国补落地仅一周,国内智能手机销量同比增长约65%,出 货量超950万部,其中2000-5000元价位段的智能手机销量增长尤为显著。 在国补政策推动下,2025年第一季度中国智能手机市场延续了自2024年以来的复苏态势,总出货量达 7090万部,同比增长5%。值得注意的是,小米凭借其越来越硬核的产品实力和政策红利 ...
华为首款百万车,凭什么叫板迈巴赫?
雷峰网· 2025-06-05 00:29
设计与智能化,尊界S800的两大利器。 记者丨田哲 编辑丨林觉民 没有超豪华车制造经验的华为、江淮,打造了一款百万元级车型。 近日,华为联合江淮汽车发布了尊界汽车首款车型——尊界S800。这款号称超越迈巴赫S级、劳斯莱斯的 D级轿车,是鸿蒙智行当前最高端的车型,预售价为100万-150万元。 但真正引爆市场情绪的,是它出人意料的起售价——仅为70.8万元。一经公布,现场掌声与惊呼此起彼 伏。远低于预期的价格落差,迅速撬动了尊界S800的销量。 鸿蒙智行公告显示,尊界S800上市三天已收获超2600台大定订单。以大定定金2万元计算,这意味着仅尊 界S800一款车型,鸿蒙智行就已进账至少5200万元。 尊界S800车身长达到5480mm,轴距为3370mm,整体比例更接近传统D级旗舰,视觉观感上与迈巴赫S 级高度相似,营造了稳重感和压迫感。 车身结构上,尊界S800采用多种钢铝混合材料,在保持车身轻量化的同时,也保障了外观结构的完整性与 一致性。 华为已通过问界M8、M9,在新能源豪华车市场展现不俗的战斗力。今年5月,问界M8、M9均位列乘用车 市场40万、50万级销冠。 相比售价在40-50万元的高端车型,定位 ...
台积电遭质疑是血汗工厂,工程师需24小时待命,总裁幽默回应;大众集团重组德国业务,约2万员工将自愿离职;百度副总裁袁佛玉挂帅前线
雷峰网· 2025-06-05 00:29
Key Points - TSMC faces accusations of being a sweatshop, with claims of labor injuries and long working hours, which the chairman Wei Zhejia refutes by emphasizing employee satisfaction and commitment [4] - Baidu's BioMap plans to go public in Hong Kong within the next 18 months, focusing on AI in life sciences and has secured over $200 million in venture capital [10] - NIO's CEO Li Bin states the company opposes price wars and believes the industry's shift away from internal competition is beneficial for NIO's path to profitability [8] - Xiaomi's CEO Lei Jun comments on Apple's struggles in the automotive sector, attributing their challenges to unknown factors while highlighting Xiaomi's strategic approach to car manufacturing [7] - BYD is actively combating black PR and has placed 126 accounts on a watchlist for malicious activities, emphasizing their commitment to protecting the company's reputation [12] - Volkswagen announces a restructuring plan in Germany, with around 20,000 employees agreeing to voluntary departures by 2030 due to rising manufacturing costs and reduced demand [29] - TikTok shifts its advertising strategy in the U.S. by discontinuing free traffic for businesses, requiring them to pay for ad placements to reach audiences [32] - Mercedes-Benz acknowledges the poor sales performance of its electric G-Class model, leading to considerations of reintroducing traditional fuel options [30][31]
独家|自动驾驶大牛杀进庭院机器人市场,斩获千万级融资
雷峰网· 2025-06-05 00:29
雷峰网获悉,由自动驾驶专家李战斌博士创办的深圳星灿智能机器人有限公司(以下简称"星灿智能"), 近期完成了一笔千万级人民币的天使轮融资,投资方主要为产业链上下游资本。 星灿智能成立于2025年3月,公司聚焦无人驾驶与具身智能技术研发,目标成为家庭智能机器人领域全球 领军品牌。公司创始人李战斌博士曾先后就职于阿里、百度、长城与吉利等知名企业, 负责自动驾驶相关 技术的研发。此外,团队中还囊括了多位拥有百度、腾讯、华为、科沃斯等知名企业背景的资深软硬件专 家,以及来自国内顶尖高校的教授、博士等高学历人才。 星灿智能专注用自动驾驶与具身智能技术赋能家庭场景,并将以割草机器人作为市场切入点。 国内唯一一家同时具备图商、主机厂和自动驾驶域控技术积累的割草机器人企业。 记者丨刘伟 编辑丨林觉民 追觅割草机器人,猛攻Top3 ? 追觅跨界大家电:以技术复利+无界生态改写行业规则 内卷的扫地机器人,创新是唯一的出路 据悉,第一阶段星灿智能将作为方案商和ODM厂商,与传统割草机厂商以及国内割草机器人初创品牌合 作,快速实现商业化,目前已有多个优质潜在客户正在洽谈合作。第二阶段,在充分完成相关技术打磨和 市场需求爆发后,星灿智能 ...
昇腾+鲲鹏双核暴击!华为打通MoE训练任督二脉再加速20%,内存省70%
雷峰网· 2025-06-04 09:31
令人惊喜的是,结果显示, MOE 训练在之前的基础上,吞吐又提升了 20% ,内存占用降低了 70% 。 这不仅是一次技术突破,更是引领 MoE 训练的风向标。 " Pangu Ultra MoE 的每一项突破,都体现了华为在AI底层技术 与工程化落地中的领先实力。 " 作者丨李希 最近,华为在 MoE 训练系统方面,给出了 MoE 训练算子和内存优化新方案:三大核心算子全面提速, 系统吞吐再提 20% , Selective R/S 实现内存节省 70% 。 在通往更强大的 AI 路上, MoE 已成为科技巨头另一个首选路径。 只要 Scaling Law 没有失效,大模型的参数规模依旧不断扩大,由此 AI 智能水平才能不断攀升。 凭借独特的架构设计, MoE 正以前所未有的参数规模,成为突破大规模模型训练的算力瓶颈的关键路径 之一。 然而,如何将 MoE 潜力真正转化为高效的训练实践,一直是业界探索的难题。 此前,华为曾通过 Adaptive Pipe&EDPB 框架,实现了集群级高效分布式计算,让通信和计算能完美并 行,提高训练集群效率。 本次,华为通过昇腾与鲲鹏算力的深度协同,进一步实现了训练算子计算 ...