昇腾算力 - filings, earnings calls, financial reports, news

昇腾算力

Search documents

智通财经网· 2025-06-20 00:40

Industry Insights - The Ministry of Commerce of China is expediting the review of export license applications related to rare earths, emphasizing the importance of maintaining global supply chain stability and security [1] - The Huawei Developer Conference is taking place from June 20 to 22, showcasing innovations in HarmonyOS and AI, with a focus on Ascend computing power and the Harmony ecosystem [2] - The photovoltaic industry is expected to see a significant production cut in Q3, with operating rates projected to decrease by 10-15%, alongside strict audits on below-cost sales [4] - The gaming industry in Beijing is receiving support through 11 new measures aimed at fostering development, indicating a high level of industry vitality [7] - UTree Technology has confirmed the completion of its Series C financing, with a pre-investment valuation exceeding 10 billion yuan, signaling growth potential in the humanoid robotics sector [8] - The International Solid-State Battery Technology Conference has commenced, highlighting advancements in materials and smart equipment, with expectations for rapid growth in the solid-state battery market by 2027 [9] Macro Insights - China's Vice Premier He Lifeng stated that the country is an ideal, safe, and proactive investment destination for multinational companies, citing its large domestic market and robust industrial system [3]

9位顶级研究员连讲3晚，华为盘古大模型底层研究大揭秘

机器之心· 2025-05-26 10:59

Core Viewpoint - The rapid development of large language models (LLMs) has become a cornerstone of general artificial intelligence systems, but the increase in model capabilities has led to significant growth in computational and storage demands, presenting a challenge for achieving high performance and efficiency in AI [1][2]. Group 1: Technological Advancements - Huawei's Noah's Ark Lab has developed the Pangu Ultra, a general language model with over 100 billion parameters, surpassing previous models like Llama 405B and Mistral Large 2 in various evaluations [2]. - The lab also introduced the sparse language model Pangu Ultra MoE, achieving long-term stable training on over 6000 Ascend NPUs [2]. Group 2: Key Research Presentations - A series of sharing sessions from May 28 to May 30 will cover breakthroughs in quantization, pruning, MoE architecture optimization, and KV optimization, aimed at developers and researchers interested in large models [3][4]. Group 3: Specific Research Contributions - **CBQ**: A post-training quantization framework that addresses the high computational and storage costs of LLMs, achieving significant performance improvements in ultra-low bit quantization [6]. - **SlimLLM**: A structured pruning method that effectively reduces the computational load of LLMs while maintaining accuracy, demonstrating advanced performance in LLaMA benchmark tests [8]. - **KnowTrace**: An iterative retrieval-augmented generation framework that enhances multi-step reasoning by tracking knowledge triplets, outperforming existing methods in multi-hop question answering [10]. Group 4: Further Innovations - **Pangu Embedded**: A flexible language model that alternates between fast and deep thinking, designed to optimize inference efficiency while maintaining high accuracy [14]. - **Pangu-Light**: A pruning framework that stabilizes and optimizes performance after aggressive structural pruning, achieving significant model compression and inference acceleration [16]. - **ESA**: An efficient selective attention method that reduces computational overhead during inference by leveraging the sparsity of attention matrices [18]. Group 5: MoE Model Developments - **Pangu Pro MoE**: A native MoE model with 72 billion parameters, designed to balance load across devices and enhance inference efficiency through various optimization techniques [21]. - **PreMoe**: An expert routing optimization for MoE models that allows dynamic loading of experts based on task-specific requirements, improving inference efficiency by over 10% while maintaining model capability [24]. Group 6: KV Optimization Techniques - **KVTuner**: A hardware-friendly algorithm for KV memory compression that achieves near-lossless quantization without requiring retraining, significantly enhancing inference speed [26]. - **TrimR**: An efficient reflection compression algorithm that identifies redundant reflections in LLMs, leading to a 70% improvement in inference efficiency across various models [26].