稀疏混合专家架构 - filings, earnings calls, financial reports, news

稀疏混合专家架构

Search documents

刷新NAVSIM SOTA！端到端自动驾驶新框架Masked Diffusion

自动驾驶之心· 2025-12-26 03:32

来源 | 机器之心原文链接：刷新NAVSIM SOTA，复旦引望提出Masked Diffusion端到端自动驾驶新框架点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文随着 VLA（Vision-Language-Action）模型的兴起，端到端自动驾驶正经历从「模块化」向「大一统」的范式转移。然而，将感知、推理与规划压缩进单一模型后，主流的自回归（Auto-regressive）生成范式逐渐显露出局限性。现有的自回归模型强制遵循「从左到右」的时序生成逻辑，这与人类驾驶员的思维直觉存在本质差异 —— 经验丰富的驾驶员在处理复杂路况时，往往采用「以终为始」的策略，即先确立长期的驾驶意图（如切入匝道、避让行人、靠边停靠），再反推当前的短期操控动作。此外，基于模仿学习的模型容易陷入「平均司机」陷阱，倾向于拟合数据分布的均值，导致策略平庸化，难以在激进博弈与保守避让之间灵活切换。针对上述痛点，复旦大学与引望智能联合提出了 WAM-Diff 框架。该研究创新 ...

刷新NAVSIM SOTA，复旦提出端到端自动驾驶新框架

具身智能之心· 2025-12-26 00:55

Core Insights - The article discusses the transition in end-to-end autonomous driving from a modular approach to a unified paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive models in mimicking human driving intuition [1][2]. Group 1: WAM-Diff Framework - The WAM-Diff framework, developed by Fudan University and Yiwang Intelligence, introduces a Discrete Masked Diffusion model for VLA autonomous driving planning, integrating a sparse mixture of experts (MoE) architecture and online reinforcement learning (GSPO) [2][4]. - WAM-Diff achieved state-of-the-art (SOTA) performance on the NAVSIM benchmark, scoring 91.0 PDMS and 89.7 EPDMS, demonstrating the potential of non-autoregressive generation in complex driving scenarios [2][16][18]. Group 2: Technical Innovations - WAM-Diff employs Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, allowing for a shared vocabulary with driving commands [5]. - The framework utilizes Masked Diffusion for generation, enabling parallel prediction of all token positions, which enhances inference efficiency and allows for global optimization [5][9]. Group 3: Decoding Strategies - WAM-Diff explores three decoding strategies: causal, reverse-causal, and random, finding that the reverse-causal strategy yields the best performance in closed-loop metrics, aligning with the "end-to-begin" planning intuition [9][20]. - This approach confirms that establishing long-term driving intentions before detailing immediate actions significantly improves planning consistency and safety [9][20]. Group 4: MoE and GSPO Integration - The MoE architecture within WAM-Diff includes 64 lightweight experts, dynamically activated based on the driving context, enhancing model capacity and adaptability while controlling computational costs [12]. - The GSPO algorithm bridges the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [12][14]. Group 5: Experimental Results - In extensive experiments on the NAVSIM benchmark, WAM-Diff outperformed several leading models, achieving a PDMS score of 91.0 and an EPDMS score of 89.7, indicating its robustness in balancing safety and compliance [16][18]. - The model's performance in NAVSIM-v2, which includes stricter metrics for traffic rule adherence and comfort, improved by 5.2 points over the previous best, showcasing its capability in real-world driving scenarios [18]. Group 6: Conclusion - WAM-Diff represents a significant advancement in autonomous driving planning, moving towards a discrete, structured, and closed-loop approach, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25].

刷新NAVSIM SOTA，复旦引望提出Masked Diffusion端到端自动驾驶新框架

机器之心· 2025-12-25 03:12

Core Insights - The article discusses the transition in end-to-end autonomous driving from a "modular" approach to a "unified" paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive generation paradigms [2] - It introduces the WAM-Diff framework, which innovatively incorporates discrete masked diffusion models into VLA autonomous driving planning, addressing the challenges of single-direction temporal generation [2][6] Group 1: WAM-Diff Framework - WAM-Diff utilizes Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, achieving an error control within 0.005 [6] - The framework employs Masked Diffusion as its backbone, allowing for parallel prediction of all token positions, significantly enhancing inference efficiency and enabling global optimization [6] - WAM-Diff explores decoding strategies, revealing that the reverse-causal strategy outperforms others in closed-loop metrics, validating the "end-to-begin" planning logic [9][20] Group 2: Performance Metrics - In the authoritative NAVSIM benchmark, WAM-Diff achieved state-of-the-art (SOTA) scores of 91.0 PDMS in NAVSIM-v1 and 89.7 EPDMS in NAVSIM-v2, demonstrating its potential in complex autonomous driving scenarios [3][18] - The model surpassed competitors like DiffusionDrive and ReCogDrive, indicating its robustness in balancing safety and compliance in real-world driving conditions [18] Group 3: Technical Innovations - WAM-Diff integrates a Low-Rank Adaptation Mixture-of-Experts (LoRA-MoE) architecture, which includes 64 lightweight experts for dynamic routing and sparse activation, enhancing model capacity and adaptability [11] - The Group Sequence Policy Optimization (GSPO) algorithm is introduced to bridge the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [14] Group 4: Conclusion - The emergence of WAM-Diff marks a significant step towards discrete, structured, and closed-loop autonomous driving planning, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25]