Workflow
AI infra工具链
icon
Search documents
存力中国行暨先进存力AI推理工作研讨会在京顺利召开
Zheng Quan Ri Bao Wang· 2025-11-07 07:29
中国信息通信研究院首席专家石友康出席研讨会并致辞,中国信息通信研究院云大所总工程师郭亮主 持。中国移动(600941)云能力中心项目总师周宇,华为数据存储产品线战略与业务发展部总裁王旭 东,北京硅基流动科技有限公司解决方案总监唐安波发表主题演讲。 研讨会上,中国信息通信研究院首席专家石友康深刻阐述了当前AI规模化应用推进,推理环节的成 本、效率、质量问题凸显,先进存力成为提升AI推理效能、控制成本的关键。当前,国家高度重视先 进存力发展,在《算力基础设施高质量发展行动计划》等政策中明确提出"加速存力技术研发应用""持 续提升存储产业能力""推动存算网协同发展",为产业发展指明了方向。中国信息通信研究院在政策研 究、标准制定、测试服务等方面开展多项工作,并联合产业链企业成立"算力产业发展方阵先进存力AI 推理工作组",同时提出了三点建议:鼓励前沿存储技术研发创新,推动存算运深度融合,加强存算协 同产业生态建设,呼吁业界同仁凝聚共识,共同推动我国存算协同发展。 唐安波在会上围绕大模型推理"推不动、推得慢、推得贵"问题展开分享,硅基流动构建的AI infra工具 链,聚焦提升算力利用率。核心推理框架适配100多款开 ...
Token经济时代,AI推理跑不快的瓶颈是“存力”?
Tai Mei Ti A P P· 2025-11-07 04:08
Core Insights - The AI industry is undergoing a structural shift, moving from a focus on GPU scaling to the importance of storage capabilities in enhancing AI performance and cost efficiency [1][10] - The demand for advanced storage solutions is expected to rise due to the increasing requirements of AI applications, with storage prices projected to remain bullish through Q4 2025 [1][10] - The transition from a "parameter scale" arms race to a "inference efficiency" commercial competition is anticipated to begin in 2025, emphasizing the significance of token usage in AI inference [2][10] Storage and Inference Changes - The fundamental changes in inference loads are driven by three main factors: the exponential growth of KVCache capacity due to longer contexts, the complexity of multi-modal data requiring advanced I/O capabilities, and the need for consistent performance under high-load conditions [4][10] - The bottleneck in inference systems is increasingly related to storage capabilities rather than GPU power, as GPUs often wait for data rather than being unable to compute [5][10] - Enhancing GPU utilization by 20% can lead to a 15%-18% reduction in overall costs, highlighting the importance of efficient data supply over merely increasing GPU numbers [5][10] New Storage Paradigms - Storage is evolving from a passive role to an active component in AI inference, focusing on data flow management rather than just capacity [6][10] - The traditional storage architecture struggles to meet the demands of high throughput, low latency, and heterogeneous data integration, which hinders AI application deployment [7][10] - New technologies, such as CXL and multi-level caching, are being developed to optimize data flow and enhance the efficiency of AI inference systems [6][10] Future Directions - The next three years will see a consensus on four key directions: the scarcity of resources will shift from GPUs to the ability to efficiently supply data to GPUs, the management of data will become central to AI systems, real-time storage capabilities will become essential, and CXL architecture will redefine the boundaries between memory and storage [10][11][12] - The competition in AI will extend beyond model performance to the underlying infrastructure, emphasizing the need for effective data management and flow [12]