DeepSeek悄悄开源LPLB：用线性规划解决MoE负载不均

Core Insights - DeepSeek has launched a new code repository called LPLB (Linear-Programming-Based Load Balancer) on GitHub, which aims to optimize the workload distribution in Mixture of Experts (MoE) models [2][5]. - The project is currently in the early research stage, and its performance improvements are still under evaluation [8][15]. Project Overview - LPLB is designed to address dynamic load imbalance issues during MoE training by utilizing linear programming algorithms [5][9]. - The load balancing process involves three main steps: dynamic reordering of experts based on workload statistics, constructing replicas of experts, and solving for optimal token distribution for each batch of data [5][6]. Technical Mechanism - The expert reordering process is assisted by EPLB (Expert Parallel Load Balancer), and real-time workload statistics can be collected from various sources [6][11]. - LPLB employs a lightweight solver that uses NVIDIA's cuSolverDx and cuBLASDx libraries for efficient linear algebra operations, ensuring minimal resource consumption during the optimization process [6][11]. Limitations - LPLB currently focuses on dynamic fluctuations in workload, while EPLB addresses static imbalances [11][12]. - The system has some limitations, including ignoring nonlinear computation costs and potential delays in solving optimization problems, which may affect performance under certain conditions [11][12]. Application and Value - The LPLB library aims to solve the "bottleneck effect" in large model training, where the training speed is often limited by the slowest GPU [15]. - It introduces linear programming as a mathematical tool for real-time optimal allocation and leverages NVSHMEM technology to overcome communication bottlenecks, making it a valuable reference for developers researching MoE architecture training acceleration [15].