Workflow
AI算力架构
icon
Search documents
瑞可达:公司目前在积极筹建泰国工厂以满足未来市场及客户需求
Mei Ri Jing Ji Xin Wen· 2026-01-15 09:52
瑞可达(688800.SH)1月15日在投资者互动平台表示,公司密切关注AI算力架构演进趋势,持续推进 电源类连接器解决方案的研发迭代。公司目前也在积极筹建泰国工厂以满足未来市场及客户需求。 (记者 王晓波) 每经AI快讯,有投资者在投资者互动平台提问:公司在英伟达新一代构架rubin构架中有何应对策略, 产品设计是否考虑到了上游AI大厂的技术升级?公司今年在产品设计升级与北美地区销售有那些有效 措施? ...
从“积木堆叠”到“有机生命体”:昇腾超节点重新定义AI算力架构
Huan Qiu Wang· 2025-05-26 10:06
Core Insights - The rapid growth of large models in AI is driving a new era of computing power demand, highlighting the limitations of traditional cluster architectures in efficiently training these models [1][2] - Traditional architectures face significant challenges, including communication bottlenecks, inefficient resource allocation, and reliability issues, which hinder the training efficiency of large models [2][3] Summary by Sections Challenges in Traditional Architectures - Communication bottlenecks have worsened exponentially, with MoE models increasing inter-node communication demands, leading to delays of over 2ms in traditional 400G networks [1][2] - Resource allocation is static and unable to adapt to dynamic changes in model structure, resulting in a 30% decrease in overall training efficiency due to uneven load distribution [1][2] - Reliability is compromised as the probability of node failure increases with scale, causing significant resource waste during lengthy recovery processes, with some companies losing over a million dollars per training interruption [2] Emergence of Ascend Supernode Architecture - The Ascend Supernode architecture represents a fundamental restructuring of computing power systems, characterized by a "three-dimensional integration" approach [3][5] - A breakthrough in hardware interconnectivity allows multiple NPUs to work as a single computer, increasing inter-node communication bandwidth by 15 times and reducing latency from 2ms to 0.2ms [3][5] - Unified global memory addressing through virtualization enables direct memory access across nodes, enhancing efficiency in parameter synchronization during model training [5][6] Innovations in Resource Management and Reliability - Intelligent resource scheduling allows for fine-grained dynamic task allocation based on the MoE model structure, improving the compute-to-communication time ratio from 1:1 to 3:1 [5][6] - The reliability of the system has been significantly improved, with average uptime increasing from hours to days, and recovery times reduced from hours to 15 minutes [5][6] Industry Impact and Future Prospects - The Ascend Supernode architecture has achieved a threefold increase in training performance compared to traditional nodes, establishing a new benchmark in AI computing [8] - The introduction of MindIE Motor enhances large-scale expert parallel capabilities, achieving four times the throughput of traditional server stacks [8] - Huawei's commitment to architecture innovation is seen as a new form of Moore's Law, positioning the company as a leader in the AI computing landscape [9]