昇腾NPU

Search documents
宏观策略周报:2025世界人工智能大会描绘AI新未来,7月制造业PMI为49.3%-20250801
Yuan Da Xin Xi· 2025-08-01 10:47
Group 1 - The report highlights the significance of the 2025 World Artificial Intelligence Conference (WAIC) held in Shanghai, showcasing advancements in AI technology and global collaboration opportunities, which are expected to drive industrial intelligence upgrades [2][11][12] - In June, the profits of industrial enterprises above designated size decreased by 1.8% year-on-year, with a notable recovery in the equipment manufacturing sector, indicating the positive impact of the "two new" policies [14][17] - The manufacturing PMI for July was reported at 49.3%, reflecting a slight decline of 0.4 percentage points from the previous month, indicating a cooling in manufacturing activity, although the production index remains above the critical point, suggesting overall expansion in production activities [19][22] Group 2 - The investment strategy emphasizes the development of new productive forces as a key policy direction, suggesting a focus on sectors such as artificial intelligence, innovative pharmaceuticals, robotics, and deep-sea technology for potential excess returns [3][33] - There is a strong recommendation to boost domestic consumption, with expectations for consumer spending to increase, particularly in new consumption, home appliances, and automotive sectors [3][33] - The report suggests that gold may see sustained demand as a safe-haven asset amid rising geopolitical tensions and global economic uncertainties, indicating a long-term investment opportunity in gold [3][33]
突发,午后跳水!超4200只个股下跌,周期股跌麻了!一则重磅消息,这个板块逆市拉升...
雪球· 2025-07-31 08:25
Market Overview - The market experienced a significant decline, with the Shanghai Composite Index dropping by 1.18%, the Shenzhen Component Index falling by 1.73%, and the ChiNext Index decreasing by 1.66% [1] - The trading volume in the Shanghai and Shenzhen markets reached approximately 19.36 trillion yuan, an increase of about 91.7 billion yuan compared to the previous trading day, with over 4,200 stocks declining [2] Sector Performance - Cyclical stocks, including coal, steel, oil, and non-ferrous metals, led the market decline, with the steel sector falling over 3% and both non-ferrous metals and coal sectors dropping more than 2% [5] - Notable individual stock declines included Angang Steel and Baosteel, which fell over 7%, while Yunnan Zinc and Northern Rare Earth dropped more than 5% [5][6] Futures Market - Multiple previously popular futures contracts saw significant declines, with glass and coking coal main contracts dropping by 8%, polysilicon falling over 7%, and industrial silicon and lithium carbonate decreasing by over 6% and nearly 5%, respectively [7] - The Dalian Commodity Exchange has adjusted trading limits for certain contracts to maintain market stability, including a reduction in daily opening positions for industrial silicon, polysilicon, and lithium carbonate [8] Domestic Semiconductor Sector - Following a significant meeting with Nvidia regarding security risks associated with its H20 computing chip, domestic semiconductor stocks surged, with companies like Dongxin Co. and Cambrian Technologies seeing substantial gains [10] - Domestic GPU companies are accelerating their development, with firms like Moer Technology and Muxi Integrated Circuit announcing IPO plans to raise funds for GPU research and market expansion [12] Infant and Child Industry - The infant and child sector continued to show strength, with companies like Sunshine Dairy and Anzheng Fashion achieving three consecutive trading limits, while several others saw notable increases [14] - The Chinese government has allocated approximately 90 billion yuan for child-rearing subsidies, and Beijing has introduced measures to enhance support for childbirth, including establishing a subsidy system and improving maternity insurance [16]
直线飙涨!刚刚,重磅突发!
券商中国· 2025-07-31 05:59
Core Viewpoint - The article discusses the recent developments regarding Nvidia's H20 chip and its implications for the Chinese market, highlighting the security concerns raised by the Chinese government and the subsequent rise of domestic alternatives in the GPU sector [1][2][6]. Group 1: Nvidia's H20 Chip and Security Concerns - The Chinese government has summoned Nvidia to explain the security risks associated with the H20 chip, citing laws related to cybersecurity and data protection [1][2]. - The H20 chip, designed specifically for the Chinese market, has faced scrutiny due to potential backdoor vulnerabilities, leading to increased calls for advanced chips to include tracking and remote shutdown features [2][5]. Group 2: Market Reactions and Domestic Alternatives - Following the news of Nvidia's security issues, domestic companies like Cambrian Technology saw significant stock price increases, with Cambrian rising over 7% [1][6]. - Several domestic GPU companies, including Moore Threads and Muxi Integrated Circuit, are accelerating their development and have recently filed for IPOs to raise funds for GPU research and market expansion [6]. - Moore Threads aims to raise 8 billion yuan for its GPU development, indicating strong market potential and a commitment to enhancing domestic capabilities in AI and graphics processing [6]. Group 3: Technological Advancements - Huawei has announced the CloudMatrix384AI super node, significantly enhancing its computing power from 6.4 pFLOPS to 300 pFLOPS, marking a 50-fold increase [7]. - The new architecture supports advanced AI model inference, improving throughput and reducing latency, showcasing the rapid advancements in domestic technology [7].
华为首个开源大模型来了!Pro MoE 720亿参数,4000颗昇腾训练
Hua Er Jie Jian Wen· 2025-06-30 07:27
Core Insights - Huawei has announced the open-sourcing of its Pangu models, including the 70 billion parameter dense model and the 720 billion parameter mixture of experts (MoE) model, marking a significant step in the domestic large model open-source competition [1][3][20] Model Performance - The Pangu Pro MoE model achieves a single-card inference throughput of 1148 tokens/s on the Ascend 800I A2, which can be further enhanced to 1528 tokens/s using speculative acceleration technology, outperforming similar-sized dense models [3][11] - The Pangu Pro MoE model is built on the MoGE architecture, with a total parameter count of 720 billion and an active parameter count of 160 billion, optimized specifically for Ascend hardware [4][11] Training and Evaluation - Huawei utilized 4000 Ascend NPUs for pre-training on a high-quality corpus of 13 trillion tokens, divided into general, inference, and annealing phases to progressively enhance model capabilities [11] - The Pangu Pro MoE model has demonstrated superior performance in various benchmarks, including achieving a score of 91.2 in the DROP benchmark, closely matching the best current models [12][14] Competitive Landscape - The open-sourcing of Pangu models coincides with a wave of domestic AI model releases, with leading companies like MiniMax and Alibaba also upgrading their open-source models, leading to a price reduction of 60%-80% for large models [3][20] - The Pangu Pro MoE model ranks fifth in the SuperCLUE Chinese large model benchmark, surpassing several existing models and indicating its competitive position in the market [17][18] Technological Integration - Huawei's ecosystem, integrating chips (Ascend NPU), frameworks (MindSpore), and models (Pangu), represents a significant technological achievement, providing a viable high-performance alternative to Nvidia's dominance in the industry [20]
训练大模型,终于可以“既要又要还要”了
虎嗅APP· 2025-05-29 10:34
Core Insights - The article discusses the advancements in the MoE (Mixture of Experts) model architecture, particularly focusing on Huawei's Pangu Ultra MoE, which aims to balance model performance and efficiency while addressing challenges in training large-scale models [1][6][33] Group 1: MoE Model Innovations - Huawei's Pangu Ultra MoE model features a parameter scale of 718 billion, designed to optimize the performance and efficiency of large-scale MoE architectures [6][9] - The model incorporates advanced architectures such as MLA (Multi-head Latent Attention) and MTP (Multi-token Prediction), enhancing its training and inference capabilities [6][7] - The Depth-Scaled Sandwich-Norm (DSSN) and TinyInit methods are introduced to improve training stability, reducing gradient spikes by 51% and enabling long-term stable training with over 10 trillion tokens [11][12][14] Group 2: Load Balancing and Efficiency - The EP (Expert Parallelism) group load balancing method is designed to ensure efficient token distribution among experts, enhancing training efficiency without compromising model specialization [19][20] - The Pangu Ultra MoE model employs an EP-Group load balancing loss that allows for flexible routing choices, promoting expert specialization while maintaining computational efficiency [20][21] Group 3: Training Techniques and Performance - The model's pre-training phase utilizes dropless training, achieving a long sequence capability of 128k, which enhances its learning efficiency on target data [8][14] - The introduction of MTP allows for speculative inference, significantly improving the acceptance length by 38% compared to single-token predictions [24][27] - The reinforcement learning system designed for post-training focuses on iterative hard example mining and multi-capability collaboration, ensuring comprehensive performance across various tasks [28][31] Group 4: Future Implications - The advancements presented in Pangu Ultra MoE provide a viable path for deploying sparse large models at scale, pushing the performance limits and engineering applicability of MoE architectures [33]
Bye,英伟达!华为NPU,跑出了准万亿参数大模型
量子位· 2025-05-08 04:04
Core Viewpoint - Huawei has successfully trained a trillion-parameter model, marking a significant advancement in AI capabilities and reducing reliance on Nvidia's technology [1][4][74]. Group 1: Challenges in Training Large Models - Training trillion-parameter models faced several challenges, including load balancing difficulties, high communication overhead, and low training efficiency [3][10]. - The architecture optimization, dynamic load balancing, distributed communication bottlenecks, and hardware adaptation complexities were identified as the four main challenges [10]. Group 2: Huawei's Solutions - Huawei's Pangu team utilized over 6,000 Ascend NPUs to achieve stable training of a 718 billion parameter MoE model, implementing breakthrough system optimization techniques [4][5]. - The team developed a model simulation tool that accurately predicts performance, achieving over 85% accuracy in matching actual test data [17]. Group 3: Load Balancing and Efficiency - A new EP group load balancing loss algorithm was introduced, which balances task distribution without excessive constraints, thus saving communication costs [24][25]. - The training efficiency of the Pangu Ultra MoE model improved significantly, with a Model FLOPs Utilization (MFU) of 30.0%, a 58.7% increase compared to previous optimizations [33]. Group 4: Communication Optimization - The team designed a hierarchical EP communication strategy to reduce inter-node communication volume, enhancing overall training efficiency [42][44]. - An adaptive pipe overlap mechanism was implemented to mask communication delays, further improving performance [48]. Group 5: Model Performance and Benchmarking - The Pangu Ultra MoE model demonstrated competitive performance across various benchmarks, achieving high scores in general understanding and reasoning tasks [61][62]. - The model's architecture allows for significant specialization among experts, enhancing its overall expressiveness and performance [64][66]. Group 6: Future Implications - The advancements in Huawei's technology signify a shift in the global AI landscape, showcasing China's capabilities in leading AI innovations [74]. - The ongoing development and application of the Pangu Ultra MoE model are expected to drive intelligent transformation across various industries, contributing to China's technological leadership [74].
中科大华为发布生成式推荐大模型,昇腾NPU可部署,背后认知一同公开
量子位· 2025-04-06 02:33
Core Viewpoint - The article discusses the emergence of generative recommendation models, particularly the HSTU framework, which has shown significant advancements in the recommendation system landscape, especially with the successful deployment on domestic Ascend NPU [1][4][5]. Group 1: Development of Generative Recommendation Models - The generative recommendation paradigm, characterized by the expansion law, is becoming a future trend in recommendation systems [4][6]. - The evolution of recommendation systems has shifted from manual feature engineering to complex model designs, and now back to focusing on feature engineering due to the limitations of deep learning capabilities [5][6]. - The success of large language models has inspired researchers in the recommendation field to explore scalable models that can enhance recommendation effectiveness [5][6]. Group 2: Performance Analysis of Different Architectures - A comparative analysis of HSTU, Llama, GPT, and SASRec revealed that HSTU and Llama significantly outperform others in scalability as model parameters increase, while GPT and SASRec show limited scalability in recommendation tasks [7][9]. - HSTU consistently outperformed baseline models like SASRec in multi-domain scenarios, demonstrating its potential in addressing cold start problems [13]. Group 3: Key Components and Their Impact - The removal of the Relative Attention Bias (RAB) from HSTU led to a noticeable decline in performance, indicating its critical role in the model's scalability [9][11]. - Modifications to the residual connection and the introduction of RAB to SASRec improved its scalability, highlighting the importance of these components in enhancing traditional recommendation models [11][12]. Group 4: Future Directions - The report identifies potential research directions for generative recommendation models, including data engineering, tokenizer efficiency, and training inference efficiency, which could help address current challenges and expand application scenarios [18].