大模型训练

Search documents
江苏发布创新提升数字贸易政策措施
Xin Hua Ri Bao· 2025-07-02 21:40
Group 1 - The core viewpoint of the article is that Jiangsu Province aims to leverage digital trade to promote high-quality development of service trade, with a target of reaching a service trade scale of 600 billion yuan and digital delivery service trade of 300 billion yuan by 2030, accounting for approximately 50% of the service trade [1] - Jiangsu will focus on institutional openness in digital trade, creating a digital trade ecosystem, and aligning with high-standard economic and trade rules, including pilot cooperation in digital trade with Singapore [1] - The province plans to establish national service trade innovation development demonstration zones and national digital trade demonstration zones, enhancing infrastructure and public services in key areas like Nanjing Software Valley to facilitate domestic and international industrial chain collaboration [1] Group 2 - A significant highlight of the policy is industry empowerment, with Jiangsu focusing on developing digital product trade in the cultural industry, strengthening cultural trade bases in cities like Nanjing, Wuxi, and Suzhou, and promoting exports in sectors such as animation and film [2] - The province aims to expand digital technology trade in advantageous fields, advance high-end software development, and implement an "Artificial Intelligence+" action plan to upgrade service outsourcing and promote enterprise transformation [2] - Jiangsu will enhance international transportation service capabilities, optimize international route networks, and accelerate the development of smart ports and waterways, while also improving the international competitiveness of tourism services and supporting international education services [2]
华升股份(600156.SH)拟购买易信科技100%股份 6月24日复牌
智通财经网· 2025-06-23 08:57
Group 1 - The company plans to acquire 100% of Yixin Technology through a combination of share issuance and cash payment, with the transaction price yet to be determined [1] - Yixin Technology focuses on the AIDC field, providing lifecycle services for green computing infrastructure, including planning, construction, operation management, and energy-saving product development [1] - The transaction aligns with national strategies to promote new information infrastructure and cultivate new productive forces [1] Group 2 - Yixin Technology has established and operates multiple high-performance intelligent computing centers in various locations, including Shenzhen, Huizhou, Guangzhou, and Haikou, and is currently building a green computing center in Hunan [2] - The company aims to enhance regional coordination and overall operational efficiency of intelligent computing infrastructure, catering to high-demand scenarios such as low-altitude economy, artificial intelligence, industrial internet, and fintech [2] - This acquisition is expected to deepen the company's integration into the national computing network layout, supporting high-quality development of new productive forces [2]
成立不到五年,这家GPU厂商即将A股上市
Sou Hu Cai Jing· 2025-06-19 10:54
Core Viewpoint - The domestic GPU company "Mole Thread" has completed its IPO counseling, marking a significant step towards its public listing in the competitive semiconductor industry [2][4]. Company Overview - Mole Thread was founded in October 2020 by Zhang Jianzhong, a former NVIDIA executive with over 20 years of experience in the GPU field [7]. - The company has launched multiple generations of GPU chips and has obtained 425 authorized patents by October 2024 [7]. - Mole Thread has developed a comprehensive product line that includes AI chips, gaming graphics cards, and cluster computing solutions, catering to both B-end and C-end markets [7]. Product Development - Mole Thread has released three generations of fully functional GPU chips: "Sudi," "Chunxiao," and "Quyuan" [7]. - The "Sudi" chip is the first to support AV1 encoding and features capabilities for modern graphics rendering, AI computation acceleration, and scientific computing [8]. - The "Chunxiao" chip integrates 22 billion transistors and shows significant performance improvements over "Sudi," including a 3x increase in graphics rendering and a 4x increase in encoding capabilities [8]. - The "Quyuan" chip, the third generation, offers a performance enhancement of 3 to 5 times compared to "Chunxiao" [8]. Technological Advancements - Mole Thread's "KUA" intelligent computing cluster solution has expanded from a thousand-card scale to a ten-thousand-card scale, enabling high-performance computing systems for training large models [9]. - The ten-thousand-card cluster supports various precision calculations, including FP8, and is compatible with mainstream large models like GPT and DeepSeek [9]. Financial Background - Since its establishment, Mole Thread has undergone six rounds of financing, raising several billion yuan in total [10]. - Notable funding rounds include a 20 billion yuan A round in November 2021 and a B+ round exceeding 2 billion yuan in November 2023 [11]. Corporate Structure - In 2024, Mole Thread underwent a shareholding reform, increasing its registered capital from 24.41 million yuan to 330 million yuan in preparation for its IPO [12].
不用GPU,大模型每2秒吃透一道高数大题!这就是华为的实力
雷峰网· 2025-05-30 09:48
Core Viewpoint - Huawei defines the benchmark for domestic large model training through technological innovation, achieving breakthroughs in computing power utilization and post-training throughput [1][4]. Group 1: Technological Innovations - Huawei's "Ascend + Pangu Ultra MoE" combination has unlocked a fully controllable training loop for domestic computing power and models, achieving industry-leading performance in cluster training systems [4][5]. - The pre-training phase saw the Ascend Atlas 800T A2 cluster's model training utilization (MFU) increase to 41%, while the post-training phase achieved a throughput of 35K Tokens/s on a single CloudMatrix 384 super node [5][36]. - Huawei disclosed key technologies in its technical report, highlighting the efficient integration of sparse MoE reinforcement learning post-training frameworks [6][7]. Group 2: Challenges in Current Training Processes - Six main challenges were identified in the current MoE pre-training and reinforcement learning post-training processes, including difficulties in parallel strategy configuration, communication bottlenecks, uneven system load distribution, excessive operator scheduling overhead, complex training process management, and limitations in large-scale expansion [10][11]. Group 3: Solutions to Enhance Training Efficiency - Huawei proposed a complete end-to-end solution to address these challenges, focusing on enhancing training cluster utilization through intelligent parallel strategy selection, deep integration of computation and communication, and global dynamic load balancing [12][14]. - The first strategy involved optimizing parallel configurations, achieving a deployment that included 16 pipeline parallelism, 8 tensor parallelism, and 32 expert parallelism [15][16]. - The second strategy focused on releasing computing power at the single-node level, doubling the micro-batch size (MBS) and optimizing operator scheduling to fully utilize Ascend node capabilities [20][21]. Group 4: Reinforcement Learning Innovations - Huawei introduced the RL Fusion training and inference co-card technology, which supports flexible deployment modes and achieves a doubling of cluster utilization in post-training [28][29]. - The design of a semi-asynchronous mechanism, StaleSync, allows different tasks to execute in parallel while maintaining model accuracy, resulting in a 50% increase in overall training throughput [30]. Group 5: Performance Metrics and Future Prospects - The Pangu Ultra MoE model, with 718 billion parameters, demonstrated high performance during training, achieving a model utilization rate of 41% and a throughput of 35K Tokens/s in post-training [35][36]. - The system is designed to support ultra-large-scale clusters and models, with expectations for future iterations to achieve even higher utilization rates [35][36].
每2秒吃透一道高数大题!华为终于揭秘准万亿MoE昇腾训练系统全流程
华尔街见闻· 2025-05-30 09:38
Core Viewpoint - Huawei has achieved significant advancements in training large models through its "Ascend + Pangu Ultra MoE" system, demonstrating a fully domestic and GPU-free training process that enhances computational efficiency and model performance [3][4][38]. Group 1: Technical Innovations - Huawei's training system has achieved a model training efficiency with a utilization rate (MFU) of 41% during the pre-training phase using the Ascend Atlas 800T A2 cluster [4][38]. - The Pangu Ultra MoE model consists of 718 billion parameters, featuring a unique architecture with 61 layers, including 58 MoE layers, and is designed for high performance and scalability [38][39]. - The system supports a high throughput of 35K Tokens/s during the reinforcement learning (RL) post-training phase, showcasing its capability to process complex tasks rapidly [39]. Group 2: Challenges Addressed - The report identifies six key challenges in the current MoE pre-training and RL post-training processes, including difficulties in parallel strategy configuration, communication bottlenecks, and uneven system load distribution [7][10][12][13]. - Huawei has developed a comprehensive end-to-end solution to address these challenges, focusing on optimizing training cluster utilization and enhancing communication efficiency [14][16][25]. Group 3: Specific Solutions - The first strategy involves improving training cluster utilization through intelligent parallel strategy selection and global dynamic load balancing, significantly enhancing overall training efficiency [16][23]. - The second strategy focuses on releasing computational power at the single-node level by optimizing training operators and enhancing memory management, achieving a twofold increase in micro-batch size [26][30]. - The third strategy introduces high-performance scalable RL post-training technologies, allowing for flexible deployment modes and doubling the utilization rate of RL post-training clusters [33][34].
华为AI实力!不用GPU,大模型每2秒吃透一道高数大题!
第一财经· 2025-05-30 09:32
Core Viewpoint - Huawei has achieved significant advancements in training large models through its "Ascend + Pangu Ultra MoE" combination, enabling a fully controllable training process without the need for GPUs, showcasing industry-leading performance in cluster training systems [2][3]. Group 1: Technical Innovations - Huawei's training system has improved the model training efficiency significantly, with a pre-training model utilization rate (MFU) reaching 41% and a post-training throughput of 35K Tokens/s on the CloudMatrix 384 super node [3][34]. - The company has introduced a series of innovative solutions to address challenges in the MoE pre-training and reinforcement learning (RL) post-training processes, including intelligent parallel strategy selection and global dynamic load balancing [11][17]. - The training system utilizes a hierarchical All-to-All communication architecture to reduce communication overhead to nearly zero, enhancing the efficiency of expert parallel communication [14][15]. Group 2: Training Process Optimization - The training cluster's utilization has been optimized through a simulation-driven intelligent parallel optimization framework, which automates the selection of optimal deployment configurations [12][13]. - The team has implemented a memory optimization framework that achieves over 70% savings in activation memory, ensuring reliable long-term training even under increased memory pressure [25]. - The RL Fusion technology allows for flexible deployment modes, significantly improving resource scheduling during the inference phase and doubling the utilization rate in RL post-training [27][28]. Group 3: Model Specifications - The Pangu Ultra MoE model features 718 billion parameters, with a structure that includes 61 layers of Transformer architecture, designed for high sparsity and performance [32]. - The model's training utilized a cluster of 6K - 10K Ascend 800T A2 cards, achieving a high model utilization rate during the pre-training phase [32]. - The architecture supports efficient scaling to larger parameter models and clusters, with expectations of achieving an MFU greater than 50% in future iterations [32].
Pangu Ultra准万亿MoE模型:业界一流,源自昇腾原生的长稳训练
第一财经· 2025-05-29 10:50
Core Viewpoint - The article discusses the advancements in the Pangu Ultra MoE model, which is a near-trillion parameter MoE model trained on Ascend NPUs, focusing on its architecture, training methods, and performance improvements [1][3]. Group 1: Model Architecture and Training Innovations - Pangu Ultra MoE features a total parameter count of 718 billion, with 39 billion activated parameters, utilizing 256 routing experts where each token activates 8 experts [5][6]. - The model employs Depth-Scaled Sandwich-Norm (DSSN) and TinyInit methods to enhance training stability, achieving a 51% reduction in gradient spikes [7][11]. - The training process incorporates a dropless training strategy, allowing for long-term stable training on over 10 trillion tokens [1][7]. Group 2: Performance and Efficiency - The architecture is designed to optimize performance on the Ascend NPU platform by integrating computation, communication, and memory metrics, resulting in superior training and inference throughput [3][5]. - Pangu Ultra MoE demonstrates robust performance across various authoritative open-source evaluation sets, outperforming several mainstream models in multiple benchmarks [6][4]. Group 3: Load Balancing and Expert Specialization - The EP group loss method is introduced to maintain load balancing among experts while allowing for expert specialization, enhancing overall training efficiency [12][15]. - The model's design allows for flexible routing choices, promoting expert specialization based on the data domain, which is evidenced by significant differences in expert selection across various languages [16][17]. Group 4: Multi-Token Prediction and Reinforcement Learning - The Multi-Token Prediction (MTP) strategy enhances inference efficiency by predicting multiple candidate tokens before the main model generates them, achieving a 38% increase in acceptance length [20][22]. - The reinforcement learning system implemented in Pangu Ultra MoE addresses challenges in training stability and inference performance by iteratively mining difficult examples and employing a multi-capability reward system [24][27].
训练大模型,终于可以“既要又要还要”了
虎嗅APP· 2025-05-29 10:34
Core Insights - The article discusses the advancements in the MoE (Mixture of Experts) model architecture, particularly focusing on Huawei's Pangu Ultra MoE, which aims to balance model performance and efficiency while addressing challenges in training large-scale models [1][6][33] Group 1: MoE Model Innovations - Huawei's Pangu Ultra MoE model features a parameter scale of 718 billion, designed to optimize the performance and efficiency of large-scale MoE architectures [6][9] - The model incorporates advanced architectures such as MLA (Multi-head Latent Attention) and MTP (Multi-token Prediction), enhancing its training and inference capabilities [6][7] - The Depth-Scaled Sandwich-Norm (DSSN) and TinyInit methods are introduced to improve training stability, reducing gradient spikes by 51% and enabling long-term stable training with over 10 trillion tokens [11][12][14] Group 2: Load Balancing and Efficiency - The EP (Expert Parallelism) group load balancing method is designed to ensure efficient token distribution among experts, enhancing training efficiency without compromising model specialization [19][20] - The Pangu Ultra MoE model employs an EP-Group load balancing loss that allows for flexible routing choices, promoting expert specialization while maintaining computational efficiency [20][21] Group 3: Training Techniques and Performance - The model's pre-training phase utilizes dropless training, achieving a long sequence capability of 128k, which enhances its learning efficiency on target data [8][14] - The introduction of MTP allows for speculative inference, significantly improving the acceptance length by 38% compared to single-token predictions [24][27] - The reinforcement learning system designed for post-training focuses on iterative hard example mining and multi-capability collaboration, ensuring comprehensive performance across various tasks [28][31] Group 4: Future Implications - The advancements presented in Pangu Ultra MoE provide a viable path for deploying sparse large models at scale, pushing the performance limits and engineering applicability of MoE architectures [33]
广州南沙全力构建人工智能产业新高地
Zhong Guo Zheng Quan Bao· 2025-05-28 20:35
Group 1 - The "Bay Area Artificial Intelligence Industry Innovation Alliance" was officially established in Nansha District, Guangzhou, aiming to create a new high ground for the AI industry in the Guangdong-Hong Kong-Macao Greater Bay Area and globally [1][2] - The alliance is initiated by Hong Kong University of Science and Technology (Guangzhou) and Huawei, focusing on integrating various resources from international, Hong Kong, Macao, and mainland research institutions to empower Nansha and promote it as a leading area for AI innovation [1][2] - Nansha aims to upgrade its AI industry ecosystem by focusing on three core tasks: technological innovation, industrial aggregation, and ecological construction, with a goal to form a trillion-level industrial cluster [2] Group 2 - Nansha's AI-related industry scale is projected to reach approximately 10 billion yuan in 2024, with a year-on-year growth of 12%, establishing itself as a significant application demonstration area for AI in China [2] - Over 100 AI-related companies have gathered in Nansha, including CloudWalk Technology and Pony.ai, covering various fields such as AI chips, basic software algorithms, biometric recognition, and natural language processing [2] - The establishment of the alliance is expected to enhance the support for AI companies, with financial backing of up to 10 million yuan for key elements like computing power, data, and algorithms [2] Group 3 - Pony.ai, which settled in Nansha in 2017, has launched China's first autonomous taxi service and reported a revenue of 12.3 million yuan for its autonomous taxi business in Q1 2025, marking a 200% year-on-year increase [3] - The company has formed a global strategic partnership with Uber, planning to integrate its autonomous taxi services into Uber's platform starting in the Middle East [3] - The Guangdong Province has introduced policies to promote the integration of AI and robotics across various sectors, including education, healthcare, and finance [4] Group 4 - CloudWalk Technology, established in 2015 and listed on the Sci-Tech Innovation Board in 2022, focuses on AI technology and its application in key industries, aiming to deepen technology research and scene implementation [4] - Nansha's fully automated terminal has seen a 41.42% year-on-year increase in container throughput in Q1, showcasing the successful integration of advanced technologies like Beidou navigation and AI [4][5] - The terminal is recognized as the world's first fully automated terminal for multimodal transport, capable of continuous operation with a large fleet of autonomous guided vehicles [5]
湾区人工智能产业创新联盟成立
Zhong Guo Jing Ji Wang· 2025-05-27 03:32
Group 1 - The establishment of the Bay Area Artificial Intelligence Industry Innovation Alliance aims to promote collaborative innovation in the Guangdong-Hong Kong-Macao Greater Bay Area, involving over 400 representatives from government, academia, and international experts [1][2] - The alliance focuses on three core tasks: technological breakthroughs in key areas such as large model training and intelligent chips, the formation of a trillion-level industrial cluster, and the construction of a comprehensive industrial service system [2][3] - The alliance is positioned to make Nansha a leading hub for AI innovation, a national benchmark for AI+ industry development, and a global talent aggregation area for artificial intelligence [2] Group 2 - The establishment of the alliance is a significant step in implementing the national "New Generation Artificial Intelligence Development Plan" and advancing the Greater Bay Area as a globally influential international science and technology innovation center [3] - Multiple AI projects were signed during the event, including a collaboration between Huawei and Hong Kong University of Science and Technology (Guangzhou) to launch a "Science and Education Innovation Incubation Center" [3] - Huawei is also collaborating with China Railway Tunnel Group to plan a "Tunnel Intelligence" model system for the tunnel engineering industry, focusing on digital talent cultivation and the digital transformation of the entire tunnel engineering process [3]