Pangu

Search documents
5 Biggest Crypto Gainers of September!| Crypto book
Crypto Book· 2025-09-26 16:30
Hold on to your digital wallets, folks. September's crypto market is heating up and I've got the inside scoop on five tokens that are making waves. First up, Wormhole's W token is bridging gaps in the blockchain world.Next, Pudgy Penguins Pangu is waddling its way to the top. Avalanch's AVAC is causing a storm in the crypto scene. Sooie Suie token is bringing some serious speed to the game.And last but not least, Near Protocols Near is proving it's anything but far from success. These five are turning heads ...
从开源共建到生态繁荣:昇思MindSpore支持Day0迁移、一键部署
财联社· 2025-06-12 10:59
Core Viewpoint - The article emphasizes the rapid development of large models and the need for efficient migration and deployment solutions in the AI ecosystem, particularly through the use of MindSpore, which aims to facilitate seamless integration and performance optimization for developers [1][2]. Group 1: Migration Challenges - The first challenge is fast migration, enabling zero-cost migration of third-party framework models while ensuring complete alignment in model accuracy. MindSpore achieves this through a threefold compatibility approach, allowing for zero-code migration of mainstream models and improving training performance by 5% while maintaining distributed parallel strategies [4]. - The second challenge is rapid deployment, automating the entire training-to-inference process to make large model deployment as simple as executing a single command [2]. Group 2: Training and Inference Solutions - MindSpore supports Day 0 migration for training, providing a "no-sense intelligent translation" capability across frameworks. It utilizes tools like MindSpeed/Megatron for seamless PyTorch model migration, achieving near-zero migration loss for popular models [4]. - In inference deployment, the vLLM-MindSpore plugin allows for HuggingFace models to be deployed in under 30 minutes, with an 80% reduction in weight loading time for large models [5][6]. Group 3: Open Source and Community Engagement - Since its open-source inception on March 28, 2020, MindSpore has fostered a vibrant developer community, with over 1.2 million downloads and contributions from more than 46,000 developers across 2400 cities [7]. - The company promotes a collaborative ecosystem through community governance, providing free computing resources and knowledge sharing across 20+ technical special interest groups (SIGs) [8].
昇腾 AI 算力集群有多稳?万卡可用度 98%,秒级恢复故障不用愁
第一财经· 2025-06-10 11:25
Core Viewpoint - The article emphasizes the importance of high availability in AI computing clusters, likening them to a "digital engine" that must operate continuously without interruptions to support business innovation and efficiency [1][12]. Group 1: High Availability and Fault Management - AI computing clusters face complex fault localization challenges due to their large scale and intricate technology stack, with current fault diagnosis taking from hours to days [2]. - Huawei's team has developed a comprehensive observability capability to enhance fault detection and management, which includes cluster operation views, alarm views, and network link monitoring [2][12]. - The average AI cluster experiences multiple faults daily, significantly impacting training efficiency and wasting computing resources [2]. Group 2: Reliability and Performance Enhancements - Huawei's reliability analysis model aims to improve the mean time between failures (MTBF) for large-scale clusters to over 24 hours [3]. - The introduction of a multi-layer protection system and software fault tolerance solutions has achieved a fault tolerance rate of over 99% for optical modules [3]. - Training efficiency has been enhanced, with linearity metrics showing 96% for dense models and 95.05% for sparse models under specific configurations [6]. Group 3: Fast Recovery Mechanisms - Huawei has implemented a multi-tiered fault recovery system that significantly reduces training recovery times to under 10 minutes, with process-level recovery achieving as low as 30 seconds [9][10]. - The introduction of instance-level recovery techniques has compressed recovery times to under 5 minutes, minimizing user impact during faults [10]. Group 4: Future Directions and Innovations - Huawei's six innovative solutions for high availability include fault perception and diagnosis, fault management, and optical link fault tolerance, which have led to a cluster availability rate of 98% [12]. - Future explorations will focus on diverse application scenarios, heterogeneous integration, and intelligent autonomous maintenance to drive further innovations in AI computing clusters [12].