Core Viewpoint - Alibaba Cloud's Aegaeon solution addresses the common issue of GPU resource waste in AI model services, significantly improving GPU utilization rates and has been recognized at the prestigious SOSP 2025 conference [2][4]. Group 1: Aegaeon Solution - Aegaeon was successfully selected for the SOSP 2025 conference, highlighting its innovative approach to solving GPU resource waste in AI model services [2][4]. - During a beta test lasting over three months, Aegaeon reduced the number of NVIDIA H20 GPUs required for serving large models from 1192 to 213, achieving an 82% reduction in GPU usage [5]. - The system's ability to pool GPU resources breaks the inefficient model-to-GPU binding, allowing for more effective resource allocation [8]. Group 2: Technical Innovations - Aegaeon's core innovation is token-level scheduling, which dynamically decides whether to switch models after generating each token, enabling fine-grained management of resources [8]. - The system can support up to seven different models simultaneously on a single GPU, improving effective throughput by 1.5 to 9 times and achieving 2 to 2.5 times the request processing capability compared to existing solutions [9]. - Aegaeon reduces model switching overhead by 97% through various optimizations, ensuring real-time responsiveness for model switching [8]. Group 3: Industry Implications - The integration of system software and AI model technology is emerging as a new trend, with a focus on optimizing underlying systems to better support AI applications [9]. - The future of AI development will rely not only on hardware advancements but also on software innovations that maximize existing hardware potential [9].
阿里云AI成果入选顶会 GPU用量削减82%