Workflow
Moonlight
icon
Search documents
月之暗面公开强化学习训练加速方法:训练速度暴涨97%,长尾延迟狂降93%
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the introduction of a new acceleration engine called Seer, developed by Moonlight and Tsinghua University, which significantly enhances the reinforcement learning (RL) training speed of large language models (LLMs) without altering the core training algorithms [1][8]. Summary by Sections Performance Improvement - Seer can improve the rollout efficiency of synchronous RL by 74% to 97% and reduce long-tail delays by 75% to 93% [3][23]. Technical Architecture - Seer consists of three main modules: 1. **Inference Engine Pool**: Built on DRAM/SSD, it includes multiple inference instances and a global KVCache pool for load balancing and data reuse [9]. 2. **Request Buffer**: Acts as a unified entry for all rollout requests, managing metadata and request states for precise resource scheduling [10]. 3. **Context Manager**: Maintains context views for all requests and generates scheduling decisions based on context signals [11]. Key Technologies - **Divided Rollout**: This technique breaks down responses into independent requests and segments, reducing memory fluctuations and load imbalance [12][13]. - **Context-Aware Scheduling**: Implements a "speculative request" strategy to prioritize obtaining length features for requests, thus alleviating long request delays [17]. - **Adaptive Grouped Speculative Decoding**: Utilizes similar response patterns within groups to create a dynamic reference library for generating drafts, enhancing decoding efficiency [19]. Experimental Validation - In experiments with models like Moonlight, Qwen2-VL-72B, and Kimi-K2, Seer demonstrated a throughput increase of 74% to 97% compared to the baseline system veRL, with significantly reduced long-tail delays [21][23]. - For instance, in the Moonlight task, the last 10% of requests took 3984 seconds with veRL, while Seer reduced this to 364 seconds, achieving an 85% reduction in long-tail delays [23]. Financing and Future Plans - Moonlight is reportedly nearing completion of a new funding round, potentially raising several hundred million dollars, which could elevate its valuation to $4 billion [32][33]. - The company is in discussions with investment firms, including IDG Capital and existing shareholder Tencent, with plans to complete the funding by the end of the year and initiate an IPO process in the following year [36][37].
Kimi Linear一作张宇:关于模型训练的一些感想
自动驾驶之心· 2025-11-06 00:04
Core Insights - The article discusses the development and features of the Kimi Linear model, emphasizing its innovative architecture and training process [4][5][10]. Model Architecture - Kimi Linear adopts a hybrid model approach, combining Linear Attention with a ratio of KDA:MLA set at 3:1, which was found to be optimal for balancing efficiency and performance [5]. - The model's architecture builds upon the design principles of Moonlight, with a significant increase in the sparsity of MoE from 8 to 32 [4]. Training Process - The model was trained on 5.7 trillion tokens, marking a significant scale-up from previous models, with a focus on overcoming challenges in distributed training [10][12]. - The training process involved rigorous monitoring and adjustments, including switching key parameters from bf16 to fp32 to ensure stability and performance [12][13]. Performance and Benchmarking - Despite being a smaller model, Kimi Linear demonstrated substantial improvements in benchmark comparisons, often outperforming larger models in specific tasks [7][14]. - The model's decoding efficiency was enhanced, achieving a speedup of approximately 6 times due to the reduced KV Cache usage from KDA [8]. Future Directions - The article indicates that Kimi aims to establish itself as a flagship model, with ongoing efforts to refine its architecture and performance metrics [17][19]. - The focus on hybrid models and efficient attention mechanisms is highlighted as a key area for future research and development within the industry [19].