Seek .-DeepSeek悄悄开源LPLB：用线性规划解决MoE负载不均

Core Insights - DeepSeek has launched a new code repository called LPLB on GitHub, which aims to address the bottlenecks of correctness and throughput in model training [1][4] - The project currently has limited visibility, with fewer than 200 stars on GitHub, indicating a need for more attention [1] Project Overview - LPLB stands for Linear-Programming-Based Load Balancer, designed to optimize load balancing in machine learning models [3] - The project is still in the early research phase, with performance improvements under evaluation [7] Mechanism of LPLB - LPLB implements dynamic load balancing through three main steps: dynamic reordering of experts, constructing replicas, and solving optimal token allocation for each batch [4] - The mechanism utilizes a built-in linear programming solver and NVIDIA's cuSolverDx and cuBLASDx libraries for efficient linear algebra operations [4][10] Comparison with EPLB - LPLB extends the capabilities of EPLB (Expert Parallel Load Balancer) by focusing on dynamic fluctuations in load, while EPLB primarily addresses static imbalances [8] Key Features - LPLB introduces redundant experts and edge capacity definitions to facilitate token redistribution and minimize load imbalance among experts [9] - The communication optimization leverages NVLINK and NVSHMEM to reduce overhead compared to traditional methods [10] Limitations - Current limitations include ignoring nonlinear computation costs and potential delays in solving optimization problems, particularly for small batch sizes [11][12] - In extreme load imbalance scenarios, LPLB may not perform as well as EPLB due to its allocation strategy [12] Typical Topologies - LPLB allows for various topological configurations, such as Cube, Hypercube, and Torus, to define the distribution of expert replicas [13] Conclusion - The LPLB library aims to solve the "bottleneck effect" in large model training, where the training speed is limited by the slowest GPU [14] - It innovatively employs linear programming for real-time optimal allocation and utilizes NVSHMEM technology to overcome communication bottlenecks, making it a valuable resource for developers working on MoE architecture training acceleration [14]