GPU资源利用率

Search documents
阿里云AI基础设施成果入选顶级学术会议,显著提升GPU利用率
Yang Zi Wan Bao Wang· 2025-10-16 08:29
Core Insights - The top academic conference SOSP2025 held in Seoul, South Korea, accepted only 66 papers, with Alibaba Cloud's GPU pooling service multi-model research being successfully included, proposing the Aegaeon multi-model hybrid service system that significantly enhances GPU resource utilization [1][2] - The conference highlighted the trend of integrating system software with AI large model technology, as the number of global models continues to grow, with Hugging Face hosting over 1 million models [1] Group 1 - Alibaba Cloud's Aegaeon system innovatively implements scheduling at the token level, allowing for model switching based on precise execution time predictions and a novel token-level scheduling algorithm, achieving a 97% reduction in model switching overhead [2] - Aegaeon supports simultaneous service of up to 7 different models on a single GPU, improving effective throughput by 1.5 to 9 times and achieving 2 to 2.5 times the request processing capability compared to existing mainstream solutions [2] - The core technology of Aegaeon has been deployed on Alibaba Cloud's Bailian platform, reducing the required GPU count for serving multiple models by 82% [2] Group 2 - The Alibaba Cloud Bailian platform has launched over 200 leading industry models, including Qwen, Wan, and DeepSeek, with a 15-fold increase in model invocation over the past year [2]