推理需求爆发，国产芯片从“堆算力”转向系统协同

Group 1 - The domestic computing power is in a very favorable position, with a shift in focus towards high-performance and cost-effective chips due to changing industry demands [1][5] - The third-generation inference GPU chip, S3, was launched by Xiwang, aiming to reduce the cost of one million tokens to one cent, reflecting the industry's transition from training to inference [3] - By 2030, it is expected that inference chips will account for 80% of the company's resource allocation, indicating a strategic focus on optimizing inference capabilities [3] Group 2 - The integrated training and inference chips face challenges such as high costs, unstable supply, and complex deployment, highlighting the need for a reasonable computing power to memory access ratio [4] - The "memory wall" has become a significant bottleneck in chip performance, as the speed of computing unit enhancements outpaces memory bandwidth improvements, particularly in inference chips [4] - Companies like DeepSeek are driving innovation across the entire technology chain, from model architecture to inference systems, aiming to reduce dependency on NVIDIA's CUDA ecosystem [4] Group 3 - The reduction of costs in AI applications significantly boosts the number of applications in the market, with the domestic computing power positioned advantageously to capitalize on this trend [5]