FlexKV多级缓存技术

Search documents
腾讯邱跃鹏:推理需求爆发,云基础设施也要同步升级
Hua Er Jie Jian Wen· 2025-09-16 08:04
作者 | 黄昱 2025年AI应用爆发,同时迎来Agent元年等背景下,推理需求暴涨。为了抓住这一机遇,云服务厂商也积极升级云基础设施,来满足市场需求。 9月16日,在2025腾讯全球数字生态大会上,腾讯集团副总裁、腾讯云总裁邱跃鹏表示,大模型产业重心从训练到推理的转变,已经成为行业共识。同时客 户对于使用大模型和建设Agent迸发出强烈热情,这都带来了推理需求的暴涨。 这也意味着,AI基础设施要同步升级。 近年来,腾讯云正不断升级云基础设施,以支撑Agent规模化落地和企业全球化发展。据邱跃鹏介绍,腾讯云已在推理加速、Agent Infra和国际化布局等方 面取得突破,并将以更加开放的姿态,助力企业把握时代机遇。 在推理加速方面,腾讯云深入参与开源贡献,向DeepSeek、vLLM、SGLang等社区提交了多项优化技术。同时,针对大模型推理面临的内存瓶颈,腾讯云自 研并开源FlexKV 多级缓存技术,大幅降低KVCache的占用,将首字时延降低多达70%。 同时,邱跃鹏透露,腾讯云依托异构计算平台整合多种芯片资源,向外界提供高性价比的 AI 算力。目前,该平台已全面适配主流国产芯片。 据悉,软硬件协同全栈优 ...
腾讯邱跃鹏:面向Agent和全球化趋势,全面升级云基础设施
Zheng Quan Shi Bao Wang· 2025-09-16 06:02
Core Insights - The widespread application of AI is driving a surge in inference demand and cloud infrastructure upgrades [2][3] Group 1: Cloud Infrastructure Upgrades - Tencent Cloud is continuously upgrading its cloud infrastructure to support the large-scale deployment of AI agents and global business development [2] - The company has made breakthroughs in inference acceleration, agent infrastructure, and internationalization [2] - Tencent Cloud has developed and open-sourced FlexKV multi-level caching technology, significantly reducing KVCache usage and cutting first-byte latency by up to 70% [2] Group 2: AI Agent Applications - Tencent Cloud has launched the Agent Runtime solution, which integrates execution engines, cloud sandboxes, and security observability to provide a stable operating environment for AI agents [2] - The Cloud Mate intelligent agent has improved architecture governance and fault diagnosis efficiency, achieving a 95% risk SQL interception rate and reducing troubleshooting time from 30 hours to as fast as 3 minutes [3] Group 3: Global Market Performance - Tencent Cloud's self-developed products have enhanced performance and reliability, with over 200 million cores deployed in the Star Sea server and flagship SA9 achieving 768 cores per machine [3] - The proprietary cloud TCE has achieved a recovery time objective (RTO) of 2 minutes, meeting near-financial-grade disaster recovery standards [3] - The new TDSQL Boundless database combines ease of use with high concurrency, reducing latency by over 80% in complex queries through an AI optimizer [3] Group 4: International Expansion - Tencent Cloud's infrastructure covers 55 global availability zones with over 3,200 acceleration nodes, providing security protection for thousands of games and defending against a 183% year-on-year increase in DDoS attacks [3] - The company is accelerating its internationalization efforts, planning to establish new availability zones in Osaka, Japan, and Saudi Arabia, and has set up 9 technical support centers globally [3][4] - Tencent Cloud completed a large-scale migration for an Indonesian version of "Didi + Meituan" in just 5 months, establishing the third availability zone in Indonesia [4] Group 5: Future Investments - Tencent Cloud will continue to increase investments in technological innovation and global expansion to assist Chinese enterprises in stable overseas operations while providing secure, reliable, and intelligent cloud services to global businesses [5]