Workflow
Telecommunications Equipment
icon
Search documents
以加代乘?华为数学家出手,昇腾算子的高能设计与优化,性能提升30%!
机器之心· 2025-05-23 04:17
Core Viewpoint - The article discusses the rapid advancements in large language models (LLMs) and the challenges they face in inference, particularly regarding speed and energy efficiency. It highlights Huawei's innovative solutions to optimize these models through hardware-software integration, focusing on three key technologies that enhance inference speed and energy efficiency [2][4][11]. Group 1: Key Technologies - AMLA technology transforms complex multiplication into addition operations, significantly increasing chip utilization rates to 71% and improving performance by over 30% in the attention operator [4][5]. - The fusion operator optimization combines multiple operators into a single composite operator, enhancing parallel processing and reducing redundant data movement, leading to substantial performance improvements in model inference [7][9]. - SMTurbo technology enables ultra-low latency memory sharing across 384 cards, achieving sub-microsecond delays and enhancing memory access throughput by over 20% in cross-machine communication scenarios [10][9]. Group 2: Future Developments - Future research on AMLA will focus on optimizing the MLA operator for quantization scenarios, expanding its application [12]. - The fusion operator optimization will explore its application across more model architectures, promoting efficient inference of large language models on Huawei's Ascend hardware [12]. - Load/Store optimization will balance read and write loads, aiming for practical benefits in large batch sizes within Deepseek dispatch and combine scenarios [12].
PLP EXPANDS EUROPEAN OPERATIONS WITH NEW FACILITY IN POLAND AND MAJOR UPGRADE IN SPAIN
Prnewswire· 2025-05-22 12:00
Core Insights - PLP has commenced construction of a new multi-purpose facility in Wieprz, Poland, set to replace operations in Bielsko-Biała and enhance manufacturing capabilities by integrating modern engineering, operations, and sales support spaces, with completion expected in 2026 [1][2] - The new facility in Poland will serve as a key European hub for PLP's core product lines and services, reflecting the company's commitment to long-term growth in the European market [4] - PLP is also expanding its operations in Southern Europe by relocating to a larger facility in Seville, Spain, driven by rising demand and the need to scale production [2][3] Poland Facility Highlights - The new facility in Wieprz will feature a 30% increase in production space and a 50% increase in warehouse space, along with a world-class research and testing laboratory [7] - Modern offices and enhanced employee amenities will be part of the new work environment [7] Spain Facility Highlights - The Seville facility will see a 250% increase in operational space and a 240% increase in office capacity, allowing for team growth and collaboration [8] - Expanded manufacturing lines will support a broader product portfolio, and improved workspaces will enhance employee amenities [8] Strategic Vision - These investments are aligned with PLP's broader strategic vision to respond to the accelerating pace of global infrastructure projects, including grid modernization, renewable energy, and high-speed broadband [4]
昇腾杀手锏FlashComm,让模型推理单车道变多车道
雷峰网· 2025-05-22 11:29
Core Viewpoint - The article discusses the communication challenges faced by MoE (Mixture of Experts) models in large-scale inference and how Huawei has addressed these issues through innovative solutions to optimize performance. Group 1: Communication Challenges - The rapid growth of MoE model parameters, often exceeding hundreds of billions, poses significant storage and scheduling challenges, leading to increased communication bandwidth demands that can cause network congestion [6][10]. - Traditional communication strategies like AllReduce have limitations, particularly in high concurrency scenarios, where they contribute significantly to end-to-end inference latency [7][11]. - The tensor parallelism (TP) approach, while effective in reducing model weight size, faces challenges with AllReduce operations that exacerbate overall network latency in multi-node deployments [7][12]. Group 2: Huawei's Solutions - Huawei introduced a multi-stream parallel technology that allows for simultaneous processing of different data streams, significantly reducing key path latency and improving performance metrics such as a 10% speedup in the Prefill phase and a 25-30% increase in Decode throughput for the DeepSeek model [12][14]. - The AllReduce operation has been restructured to first sort data intelligently (ReduceScatter) and then broadcast the essential information (AllGather), resulting in a 35% reduction in communication volume and a performance boost of 22-26% in the DeepSeek model's Prefill inference [14][15]. - By adjusting the parallel dimensions of matrix multiplication, Huawei achieved an 86% reduction in communication volume during the attention mechanism transition phase, leading to a 33% overall speedup in inference [15][19]. Group 3: Future Directions - Huawei plans to continue innovating in areas such as multi-stream parallelism, automatic weight prefetching, and model parallelism to further enhance the performance of large-scale MoE model inference systems [19][20].
帮大模型提速80%,华为拿出昇腾推理杀手锏FlashComm,三招搞定通算瓶颈
机器之心· 2025-05-22 10:25
Core Viewpoint - The article discusses the optimization of large model inference communication through Huawei's FlashComm technology, addressing the challenges posed by the exponential growth of model parameters and the need for efficient communication strategies in distributed computing environments [2][6][17]. Group 1: Communication Challenges - The rapid increase in the scale of clusters and inference concurrency in large language models has led to significant communication pressures, particularly with the expansion of Mixture of Experts (MoE) models, where the number of experts and total parameters grows exponentially [6][18]. - Traditional communication strategies, such as AllReduce, face limitations in high concurrency scenarios, leading to increased end-to-end inference latency due to bandwidth constraints [6][8]. Group 2: FlashComm Innovations - FlashComm1 optimizes AllReduce communication by decomposing it into ReduceScatter and AllGather operations, resulting in a 26% performance improvement in inference [7][11]. - FlashComm2 redefines the balance between computation and communication by transforming three-dimensional tensors into two-dimensional matrices, achieving a 33% increase in overall inference speed [7][14]. - FlashComm3 leverages multi-stream parallelism to enhance the efficiency of MoE model inference, resulting in a throughput increase of 25%-30% during the decoding phase [7][15]. Group 3: Future Directions - The Huawei team aims to continue innovating in areas such as multi-stream parallelism, automatic weight prefetching, and model parallelism to further enhance the performance of large model inference systems [17][18].
帮大模型提速80%,华为拿出昇腾推理杀手锏FlashComm,三招搞定通算瓶颈
机器之心· 2025-05-22 04:13
Core Viewpoint - The article discusses the optimization of large model inference communication through Huawei's FlashComm technology, addressing the challenges posed by the exponential growth of parameters and experts in large language models (LLMs) [2][6][17]. Group 1: Communication Challenges - The increasing scale of clusters and inference concurrency in LLMs leads to significant communication pressure, particularly with the expansion of Mixture of Experts (MoE) models, where the number of experts and total parameters grows exponentially [6][18]. - Traditional communication strategies like AllReduce face limitations in high concurrency scenarios, leading to increased end-to-end inference latency due to bandwidth constraints [6][8]. Group 2: FlashComm Innovations - FlashComm1 optimizes AllReduce communication by decomposing it into ReduceScatter and AllGather, resulting in a 26% performance improvement in inference [7][11]. - FlashComm2 redefines the balance between computation and communication by transforming three-dimensional tensors into two-dimensional matrices, achieving a 33% increase in overall inference speed [7][14]. - FlashComm3 leverages multi-stream parallelism to enhance the efficiency of MoE model inference, resulting in a throughput increase of 25%-30% during the decoding phase [7][15]. Group 3: Future Directions - The Huawei team aims to further innovate in areas such as multi-stream parallelism, automatic weight prefetching, and model automatic multi-stream parallelism to enhance the performance of large model inference systems [17][18].
烽火通信: 烽火通信科技股份有限公司关于参加中国信息通信科技集团有限公司2024年度暨2025年第一季度集体业绩说明会的公告
Zheng Quan Zhi Xing· 2025-05-21 08:23
Core Viewpoint - The company will participate in a collective performance briefing hosted by China Information Communication Technology Group Co., Ltd. to discuss its operational results and financial status for the fiscal year 2024 and the first quarter of 2025 [1][2]. Group 1: Meeting Details - The meeting is scheduled for May 29, 2025, from 14:30 to 17:00 [2][3]. - The location of the meeting is the Shanghai Securities Journal·China Securities Network Roadshow Center, accessible via the website: https://roadshow.cnstock.com/ [2][3]. - The format of the meeting will be video and online text interaction, allowing for real-time communication with investors [2][3]. Group 2: Participants and Interaction - Key participants include the Chairman, Zeng Jun, the President, Lan Hai, and the Vice President and CFO, Yang Yong, among others [2][3]. - Investors can participate online during the meeting and are encouraged to submit questions via the company's email (info@fiberhome.com) by May 27, 2025, at 17:00 [3][4]. - The company will address commonly asked questions from investors during the briefing [2][3]. Group 3: Post-Meeting Information - After the meeting, investors can view the meeting's proceedings and key content on the Shanghai Securities Journal·China Securities Network Roadshow Center [4].
ERIC Elevates Digital Experience in Jordan: Will it Benefit the Stock?
ZACKS· 2025-05-20 16:55
Group 1: Company Initiatives - Ericsson has partnered with Zain Jordan to implement a Business Support Systems (BSS) transformation initiative aimed at enhancing digital services and customer experiences while increasing operational agility [1] - The initiative will transition Zain Jordan's existing BSS framework to a cloud-native architecture, aligning with the demands of the telecom and IT landscape [1][3] - The upgrade will expand Zain Jordan's current Ericsson Charging System, introducing new features hosted on Ericsson's Cloud Native Infrastructure Solution, enabling a catalog-based business model for improved customer service [2] Group 2: Operational Benefits - The transformation is expected to accelerate service delivery, reduce operational costs, and improve time to market, thereby enhancing operational flexibility and paving the way for 5G monetization [3] - This initiative supports Zain Jordan's broader digital transformation goals and contributes to national efforts to advance the digital economy [3] Group 3: Market Position and Financial Performance - Ericsson is focusing on 5G system development and has undertaken various initiatives to position itself for market leadership in this area, with innovative solutions reshaping connectivity across sectors [4] - The company is expected to benefit from an increasing customer base, which is likely to generate higher revenues in upcoming quarters, potentially leading to improved financial performance and stock price appreciation [5] - Over the past year, Ericsson's shares have gained 48.2%, outperforming the industry's growth of 40.1% [6]
华为:让DeepSeek的“专家们”动起来,推理延迟降10%!
量子位· 2025-05-20 05:12
Core Viewpoint - The article discusses Huawei's innovative approach to optimizing the performance of the Mixture of Experts (MoE) model through a technique called OmniPlacement, which addresses the load balancing issues between "hot" and "cold" experts, leading to significant improvements in inference latency and throughput. Group 1: MoE Model and Its Challenges - The MoE model allocates tasks to specialized expert networks, enhancing overall system performance [2] - Load balancing issues arise due to the uneven call frequency of expert networks, leading to performance limitations [3][5] - The disparity in call frequency can exceed an order of magnitude, causing delays in inference time and resource utilization [4][5] Group 2: Huawei's Solution - OmniPlacement - Huawei's OmniPlacement technique aims to optimize the deployment of experts to improve MoE model performance [8] - The approach involves three main steps: joint optimization based on computational balance, inter-layer redundant deployment of high-frequency experts, and near-real-time scheduling with dynamic monitoring [9][14][18] Group 3: Key Features of OmniPlacement - The OmniPlacement algorithm dynamically adjusts expert priorities and node allocations based on real-time statistics, reducing communication overhead [12] - The inter-layer redundant deployment strategy assigns additional instances to frequently called experts, alleviating their load and enhancing system throughput [15] - The near-real-time scheduling mechanism allows for dynamic resource allocation and predictive distribution based on historical data, improving system responsiveness [19][21] Group 4: Performance Improvements - The implementation of OmniPlacement in the DeepSeek-V3 system theoretically reduces inference latency by approximately 10% and increases throughput by about 10% [6][31] - The system demonstrates high adaptability across various MoE model scales and input data distributions, ensuring efficient resource utilization and stable operation [25][26] - The dynamic monitoring mechanism ensures rapid response to sudden load changes, maintaining system stability under high-demand scenarios [32] Group 5: Open Source Initiative - Huawei plans to open-source the OmniPlacement optimization method, promoting wider adoption and collaboration within the AI community [28]
“三分天下有其一”,是鸿蒙上限?
Guan Cha Zhe Wang· 2025-05-20 01:04
Core Viewpoint - The emergence of the Harmony operating system is seen as a historical inevitability, driven by the need for a new technological ecosystem in response to the declining innovation vitality of the US-led single-core technology ecosystem [1][14][17]. Group 1: Development and Challenges of Harmony - Harmony OS is part of Huawei's "Root Technology Six Series," with the goal of making it widely recognized among consumers [2][3]. - The development of Harmony OS has faced delays, with the true version only being released in March and May of this year, indicating that the journey is just beginning [4][10]. - The system's growth is contingent on user adoption, with a target of reaching 100 million users primarily through the domestic market [32]. Group 2: Ecosystem and Market Dynamics - The ecosystem surrounding Harmony OS is crucial for its success, requiring a collaborative effort from various stakeholders, including developers and enterprises [27][31]. - The relationship between Harmony and WeChat reflects the complex dynamics of competition and cooperation within the ecosystem, highlighting the need for mutual survival [29][31]. - The development of Harmony is not just a Huawei initiative but a broader societal and industrial mobilization process [25]. Group 3: Global Context and Strategic Importance - The current technological landscape is characterized by a need for competition, as the single-core ecosystem has led to stagnation and monopolistic practices [15][14]. - The development of Harmony is positioned as a necessary step for China's high-tech self-reliance and a response to the geopolitical pressures faced by Huawei [20][14]. - The historical context of the internet's evolution is drawn parallel to Harmony's development, emphasizing the importance of a public goods approach to technology [9][12]. Group 4: Future Outlook and Vision - The vision for Harmony is to provide a viable alternative to existing systems, aiming to create a competitive landscape that fosters innovation [22][15]. - The expectation is that Harmony will eventually exceed the initial goal of capturing one-third of the market, although it currently faces the challenge of establishing a new ecosystem [23][24]. - The long-term success of Harmony will depend on its ability to adapt and grow organically, avoiding the pitfalls of rapid, unsustainable user influx [26][25].
中兴通讯CDO崔丽答21记者:“光进铜退”的趋势有效平衡了单一市场的周期波动
Core Insights - The company emphasizes the necessity of expanding overseas markets while considering cyclical balance in its development strategy [1] - In 2024, the company's international market revenue reached 39.293 billion yuan, a year-on-year increase of 4.04%, accounting for 32.39% of total revenue, up from 30.39% in 2023 [1] - The gross profit margin was 26.91%, reflecting a decline of 10.56 percentage points year-on-year [1] Revenue Contribution - Revenue contributions from different regions include 17.48 billion yuan from Europe, America, and Oceania, 15.39 billion yuan from Asia (excluding China), and 6.42 billion yuan from Africa [1] - The company has been expanding its international presence since 1996, focusing on key markets in Asia and Latin America [1] Strategic Considerations - The company aims to achieve periodic balance and complementarity in its market presence, noting that different regions have varying levels of economic development [1] - The company is particularly focused on the 5G market, as regions like Africa and Latin America are still modernizing their 4G infrastructure [1] Innovation and Challenges - The company identifies the need to transform challenges encountered in international expansion into sources of innovation [2] - Past collaborations, such as with Japanese partners during the 4G era, have driven improvements in product quality due to stringent requirements [2] - The company advocates for Chinese enterprises to collaborate and form synergies when expanding overseas [2] Digital Economy Insights - The company asserts that ICT technology is foundational to the digital economy, with decreasing computing costs potentially leading to a convergence of intellectual and energy costs [2] Future Outlook - The company plans to deepen its traditional business in major markets while exploring new growth opportunities in data centers and green low-carbon initiatives [2] - In Q1 2025, the company reported revenue of 32.968 billion yuan, a year-on-year increase of 7.82%, while net profit was 2.453 billion yuan, down 10.50% year-on-year [2] Revenue Structure Changes - The revenue structure has shifted from a previous ratio of approximately 7:2:1 (operators, government enterprises, and consumers) to a new ratio of 5:3:2, reflecting changes in market dynamics [3] - This transition is closely linked to the decline in operator investments in 5G networks and the rising demand for computing power, energy, and enterprise digitalization [3]