Workflow
路由重放机制
icon
Search documents
小米最新大模型成果!罗福莉现身了
自动驾驶之心· 2025-10-18 16:03
Core Insights - Xiaomi's AI team, in collaboration with Peking University, has recently published a paper focusing on MoE (Mixture of Experts) and reinforcement learning, revealing new advancements in large model training [2][8]. Group 1: Research Findings - The paper proposes a novel approach to enhance the stability and efficiency of large model reinforcement learning within the MoE framework [8][10]. - Current reinforcement learning methods face challenges in balancing efficiency and stability, often leading to catastrophic failures during training [14][24]. - The research introduces a method called Rollout Routing Replay (R3), which locks the routing distribution during inference and reuses it during training, ensuring consistency between the two phases [30][31]. Group 2: Experimental Results - Experiments conducted on the Qwen3-30B-A3B model demonstrate that R3 consistently outperforms other methods across various metrics, achieving higher scores in multiple scenarios [41][42]. - The introduction of R3 significantly reduces the occurrence of training crashes, maintaining a stable performance curve even after extended training periods [44][48]. - R3 not only stabilizes the model but also accelerates the optimization process, allowing for quicker identification of effective strategies [50]. Group 3: Team and Contributors - The research team includes notable contributors such as Wenhan Ma, a researcher from Xiaomi's LLM-Core team, and Luo Fuli, who has a strong academic background and has previously worked on significant AI projects [52][59]. - The paper also acknowledges the contributions of Professor Sui Zhifang from Peking University, who has extensive experience in computational linguistics and AI research [62][66].
小米最新大模型成果!罗福莉现身了
量子位· 2025-10-17 04:58
Core Viewpoint - Xiaomi's latest AI research paper, co-authored with Peking University, focuses on improving stability and efficiency in large model reinforcement learning using a new method called Rollout Routing Replay (R3) [2][7][49]. Group 1: Research Background - The collaboration between Xiaomi's AI team and Peking University has led to significant advancements in AI, particularly in the area of reinforcement learning [2][4]. - The paper addresses challenges in the Mixture of Experts (MoE) architecture, which can lead to instability during training due to routing mechanisms [8][25]. Group 2: Methodology - The proposed R3 method aims to stabilize the training process by locking the routing distribution during inference and replaying it during training, ensuring consistency between the two phases [28][30]. - Additionally, the research introduces a routing mask to cache routing decisions alongside context, enhancing computational efficiency [34][35]. Group 3: Experimental Results - Experiments conducted on the Qwen3-30B-A3B model show that R3 consistently outperforms other methods across various metrics, indicating improved overall performance [40][41]. - The stability of training has significantly improved, with R3 maintaining a smoother performance curve compared to traditional methods [43][46]. Group 4: Authors and Contributions - The first author, Wenhan Ma, is a researcher at Xiaomi's LLM-Core team, while the two corresponding authors are Luo Fuli and Professor Sui Zhifang from Peking University, both of whom have notable contributions to the field [51][56][61].