GPNPU
Search documents
云天励飞发布未来三年大算力芯片战略:目标把百万 Tokens 推理成本降低 100 倍以上
Ge Long Hui· 2026-02-03 12:49
Core Viewpoint - The company, Yuntian Lifei, has announced its strategic focus on AI inference chips for the next three years, aiming to significantly reduce the cost of inference for large models by over 100 times, thereby promoting AI from experimental technology to widespread productivity [1][10]. Group 1: Industry Changes - The global computing power industry is shifting its focus from parameter competition to efficiency in inference, emphasizing lower latency and cost [3]. - Major players like Google and NVIDIA are making strategic moves to enhance their capabilities in inference, indicating a trend towards optimizing for efficiency rather than just increasing model strength [3]. Group 2: Architectural Breakthroughs - Yuntian Lifei has established the GPNPU technology route, which combines GPGPU, NPU, and 3D stacked storage to achieve both general computing versatility and high efficiency [4]. - The GPNPU architecture aims to address the migration cost associated with mainstream software ecosystems, allowing for easy integration with existing CUDA programs [4]. - The company is also developing 3D stacked storage and advanced interconnect technologies to overcome the "memory wall" bottleneck, enhancing bandwidth and efficiency [5]. Group 3: Competitive Advantages - The CEO of Yuntian Lifei highlighted five core elements that constitute the company's competitive moat: technology, production capacity, ecosystem, market, and capital [8]. - The company is one of the few in China with sufficient domestic production capacity, ensuring high certainty for large-scale chip production and delivery [8]. - Yuntian Lifei's "1+4" structure focuses on AI inference chips and includes four business units aimed at addressing challenges from research and production to market promotion [8]. Group 4: Future Plans - The company plans to invest heavily in the development of the DeepVerse chip, focusing on optimizing inference costs, latency, and throughput [10]. - The roadmap aims to align with international platforms, targeting key optimization phases in inference to deliver cheaper, more stable, and easier-to-deploy solutions [10]. - The ultimate goal is to make inference affordable and reliable, enabling AI to transition from visible capabilities to accessible productivity [10].