从拼模型到算成本,曦望用S3 GPU给出最佳答案
半导体芯闻·2026-01-29 10:10

Core Viewpoint - The AI industry is shifting focus from training to inference, with inference requests becoming the primary demand for computational power as model training stabilizes [1][2] Group 1: Industry Trends - The demand for inference power is projected to reach 66% by 2026, surpassing training power, indicating a structural change in the industry [2] - The cost of inference currently accounts for 70% of AI application expenses, making it critical for AI companies to reduce inference costs to achieve profitability [2] - The emergence of intelligent agents and complex AI applications is accelerating the need for real-time interaction and high-frequency responses [2] Group 2: Company Developments - Xiwang Technology launched its new inference GPU chip, Qihang S3, and the Huanshi SC3 supernode solution during its first product release event after a strategic financing round of nearly 3 billion yuan [1][3] - The Qihang S3 chip features significant advancements, including a fivefold increase in inference performance compared to similar products, and is the first domestic GPGPU inference chip to use LPDDR6 memory [6][4] - The Huanshi SC3 solution is designed for large model inference scenarios, supporting high system utilization and stability, with a cost reduction from hundreds of millions to tens of millions for equivalent inference power [6][4] Group 3: Software and Infrastructure - Xiwang has developed a comprehensive self-researched software platform that is compatible with the CUDA ecosystem, facilitating seamless migration for users [7] - The company has achieved compatibility with over 90% of major models on the ModelScope platform, enhancing its service offerings [7] - The AI native intelligent computing platform introduced by Xiwang addresses industry pain points, including high GPU resource idleness and complex operational management [9][12] Group 4: Business Model Innovation - Xiwang's business model is structured around a "Token as a Service" approach, providing various token services tailored to different customer needs [14] - The company emphasizes the importance of power costs in large computing centers and has developed strategies to enhance energy efficiency and reduce operational costs [14] - Strategic partnerships with industry leaders aim to create a collaborative ecosystem that accelerates the deployment of extreme inference computing capabilities [16][17]

从拼模型到算成本,曦望用S3 GPU给出最佳答案 - Reportify