Workflow
提速79%!上交大新方法优化企业级AI流程调度 | IEEE ICDCS’ 25
量子位·2025-07-24 07:28

Core Viewpoint - The article discusses the development of LLMSched, a scheduling framework designed to enhance the efficiency of compound LLM applications by addressing uncertainties in task duration and structure through innovative modeling and scheduling strategies [1][2][3]. Group 1: Uncertainties in Compound LLM Applications - Two main uncertainties identified in compound LLM applications are duration uncertainty, with task duration fluctuations reaching up to 300 seconds, and structural uncertainty, where the number of task steps and execution structure can vary randomly [3][4]. - These uncertainties significantly hinder the performance of traditional scheduling methods, as demonstrated by the inefficiency of the Shortest Job First scheduling approach compared to uncertainty-aware scheduling [5]. Group 2: DAG Model Reconstruction - A new Directed Acyclic Graph (DAG) modeling framework has been proposed to address structural uncertainty, introducing three types of nodes: Regular Stage, LLM Stage, and Dynamic Stage [6][8]. - The reconstructed DAG model allows for a fixed topological structure representation of compound LLM applications, providing a foundation for subsequent scheduling designs [8]. Group 3: Bayesian Analysis and Entropy Reduction Mechanism - The research team discovered significant correlations among certain nodes in compound LLM applications, which can reduce uncertainty in subsequent nodes after completing certain precursor nodes [9][11]. - A Bayesian network (BN) is trained on runtime data to capture the duration distribution and inter-node correlations, enabling more accurate scheduling decisions [11]. Group 4: Scheduling Algorithm and Experimental Results - An efficient scheduling algorithm combining ε-greedy strategy with shortest remaining time first and maximum entropy reduction priorities was developed, balancing the reduction of task uncertainty and completion time [13]. - Experimental results indicate that LLMSched can reduce average task completion time by up to 79% compared to existing schedulers [15]. - LLMSched demonstrated scalability and adaptability across various task loads, achieving significant reductions in average job completion time (JCT) as task numbers increased [16]. Group 5: Ablation Study - An ablation study revealed the importance of the Bayesian network and uncertainty-aware strategies, with LLMSched outperforming alternative methods in average JCT across different workload types [19][22]. - The findings suggest that LLMSched opens new avenues for optimizing LLM services, particularly in multi-module collaborative agent systems and LLM inference cluster resource scheduling [22].