Workflow
AI推理加速卡
icon
Search documents
云天砺飞自研AI推理芯片,落地千卡集群
半导体芯闻· 2026-03-12 10:31
Core Viewpoint - The article discusses the establishment of an AI inference cluster in Zhanjiang by Yuntian Lifei, which aims to enhance AI capabilities in various applications through the use of self-developed domestic AI inference acceleration cards, with a project budget of 420 million yuan [1]. Group 1: AI Inference Shift - AI computing power is transitioning from a "training-first" approach to a "inference-first" model, with inference computing becoming crucial for AI applications [2]. - Gartner predicts that by 2026, approximately 55% of AI-specific cloud infrastructure spending will be allocated to inference workloads [2]. - The Zhanjiang cluster is designed specifically for inference tasks, supporting various industry applications and facilitating the AI transformation of traditional industries [2]. Group 2: Cluster Architecture for Inference Era - The architecture of the inference cluster is designed to meet high concurrency, high throughput, and low latency requirements, utilizing a "Prefill-Decode separation" approach for resource optimization [4]. - The Prefill phase focuses on understanding long contexts, while the Decode phase generates tokens, necessitating efficient resource allocation between the two stages [4]. - Future performance bottlenecks in inference systems are expected to arise from data access efficiency rather than just computational power, highlighting the importance of collaborative design among computing, storage, and networking [4]. Group 3: Self-Developed Chip for Cost-Effective Inference - The AI inference cluster will be built in three phases, all utilizing Yuntian Lifei's self-developed domestic AI inference acceleration cards, with the first phase deploying the X6000 inference acceleration card [7]. - The company plans to release three generations of AI inference chips over the next three years, focusing on optimizing performance for long context scenarios and low-latency decoding [7]. - The long-term goal includes achieving a cost of "one cent per hundred billion tokens" through continuous optimization of chips and systems [7]. Group 4: Industry Implications - The Zhanjiang project represents a shift in AI computing strategy from merely increasing GPU scale to focusing on cost efficiency and stable large-scale inference capabilities [8]. - The establishment of a thousand-card inference cluster not only meets current AI application demands but also serves as a technical deployment platform for larger-scale computing systems [8]. - The collaboration between domestic models and chips is expected to drive the AI infrastructure from technical exploration to large-scale application, opening new avenues for the development of the AI industry [9].
国内首个国产AI推理千卡集群落地,采用云天励飞全自研AI推理芯片
IPO早知道· 2026-03-12 05:38
Core Viewpoint - The article discusses the establishment of an AI inference cluster by Yuntian Lifei in Zhanjiang, which aims to create a "national model and national chip" ecosystem, leveraging domestic AI technologies to support various industry applications and enhance local digital transformation [3][14]. Group 1: Project Overview - Yuntian Lifei won a bid for the Zhanjiang AI penetration support project with a contract amount of 420 million yuan, focusing on building a domestic AI inference cluster based on self-developed AI inference acceleration cards [3]. - The cluster will utilize domestic large models like DeepSeek to provide AI capabilities for government and industry applications, aiming to create a model for the "national model and national chip" ecosystem [3][14]. Group 2: Technical Architecture - The AI inference cluster is designed to meet high concurrency, high throughput, and low latency requirements, employing a "Prefill-Decode separation" architecture to optimize resource allocation during different processing stages [6]. - The system architecture prioritizes optimizing the Prefill phase while balancing the Decode phase, ensuring high throughput efficiency even in long-context inference scenarios [7]. Group 3: Chip Development and Cost Efficiency - The AI inference cluster will be built in three phases, all utilizing Yuntian Lifei's self-developed AI inference acceleration cards, with the first phase deploying the X6000 inference acceleration card [10]. - Future plans include launching three generations of AI inference chips over the next three years, focusing on optimizing both Prefill and Decode phases to achieve millisecond-level inference latency [11]. Group 4: Industry Implications - The establishment of the Zhanjiang AI inference cluster represents a shift in the AI infrastructure development logic, moving from merely pursuing computational scale to emphasizing efficiency and cost [13]. - The cluster is expected to provide a significant computational foundation for local industry digital transformation and facilitate the collaborative development of domestic models and chips [14].