Workflow
稀疏混合专家架构
icon
Search documents
刷新NAVSIM SOTA!端到端自动驾驶新框架Masked Diffusion
自动驾驶之心· 2025-12-26 03:32
来源 | 机器之心 原文链接: 刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 随着 VLA(Vision-Language-Action)模型的兴起,端到端自动驾驶正经历从「模块化」向「大一统」的范式转移。然而,将感知、推理与规划压缩进单一模型 后,主流的自回归(Auto-regressive)生成范式逐渐显露出局限性。现有的自回归模型强制遵循「从左到右」的时序生成逻辑,这与人类驾驶员的思维直觉存在本 质差异 —— 经验丰富的驾驶员在处理复杂路况时,往往采用「以终为始」的策略,即先确立长期的驾驶意图(如切入匝道、避让行人、靠边停靠),再反推当前 的短期操控动作。此外,基于模仿学习的模型容易陷入「平均司机」陷阱,倾向于拟合数据分布的均值,导致策略平庸化,难以在激进博弈与保守避让之间灵活切 换。 针对上述痛点, 复旦大学与引望智能联合提出了 WAM-Diff 框架 。该研究创新 ...
刷新NAVSIM SOTA,复旦提出端到端自动驾驶新框架
具身智能之心· 2025-12-26 00:55
Core Insights - The article discusses the transition in end-to-end autonomous driving from a modular approach to a unified paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive models in mimicking human driving intuition [1][2]. Group 1: WAM-Diff Framework - The WAM-Diff framework, developed by Fudan University and Yiwang Intelligence, introduces a Discrete Masked Diffusion model for VLA autonomous driving planning, integrating a sparse mixture of experts (MoE) architecture and online reinforcement learning (GSPO) [2][4]. - WAM-Diff achieved state-of-the-art (SOTA) performance on the NAVSIM benchmark, scoring 91.0 PDMS and 89.7 EPDMS, demonstrating the potential of non-autoregressive generation in complex driving scenarios [2][16][18]. Group 2: Technical Innovations - WAM-Diff employs Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, allowing for a shared vocabulary with driving commands [5]. - The framework utilizes Masked Diffusion for generation, enabling parallel prediction of all token positions, which enhances inference efficiency and allows for global optimization [5][9]. Group 3: Decoding Strategies - WAM-Diff explores three decoding strategies: causal, reverse-causal, and random, finding that the reverse-causal strategy yields the best performance in closed-loop metrics, aligning with the "end-to-begin" planning intuition [9][20]. - This approach confirms that establishing long-term driving intentions before detailing immediate actions significantly improves planning consistency and safety [9][20]. Group 4: MoE and GSPO Integration - The MoE architecture within WAM-Diff includes 64 lightweight experts, dynamically activated based on the driving context, enhancing model capacity and adaptability while controlling computational costs [12]. - The GSPO algorithm bridges the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [12][14]. Group 5: Experimental Results - In extensive experiments on the NAVSIM benchmark, WAM-Diff outperformed several leading models, achieving a PDMS score of 91.0 and an EPDMS score of 89.7, indicating its robustness in balancing safety and compliance [16][18]. - The model's performance in NAVSIM-v2, which includes stricter metrics for traffic rule adherence and comfort, improved by 5.2 points over the previous best, showcasing its capability in real-world driving scenarios [18]. Group 6: Conclusion - WAM-Diff represents a significant advancement in autonomous driving planning, moving towards a discrete, structured, and closed-loop approach, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25].
刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架
机器之心· 2025-12-25 03:12
Core Insights - The article discusses the transition in end-to-end autonomous driving from a "modular" approach to a "unified" paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive generation paradigms [2] - It introduces the WAM-Diff framework, which innovatively incorporates discrete masked diffusion models into VLA autonomous driving planning, addressing the challenges of single-direction temporal generation [2][6] Group 1: WAM-Diff Framework - WAM-Diff utilizes Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, achieving an error control within 0.005 [6] - The framework employs Masked Diffusion as its backbone, allowing for parallel prediction of all token positions, significantly enhancing inference efficiency and enabling global optimization [6] - WAM-Diff explores decoding strategies, revealing that the reverse-causal strategy outperforms others in closed-loop metrics, validating the "end-to-begin" planning logic [9][20] Group 2: Performance Metrics - In the authoritative NAVSIM benchmark, WAM-Diff achieved state-of-the-art (SOTA) scores of 91.0 PDMS in NAVSIM-v1 and 89.7 EPDMS in NAVSIM-v2, demonstrating its potential in complex autonomous driving scenarios [3][18] - The model surpassed competitors like DiffusionDrive and ReCogDrive, indicating its robustness in balancing safety and compliance in real-world driving conditions [18] Group 3: Technical Innovations - WAM-Diff integrates a Low-Rank Adaptation Mixture-of-Experts (LoRA-MoE) architecture, which includes 64 lightweight experts for dynamic routing and sparse activation, enhancing model capacity and adaptability [11] - The Group Sequence Policy Optimization (GSPO) algorithm is introduced to bridge the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [14] Group 4: Conclusion - The emergence of WAM-Diff marks a significant step towards discrete, structured, and closed-loop autonomous driving planning, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25]