Workflow
Mamba架构
icon
Search documents
Meta没做的,英伟达做了,全新架构吞吐量狂飙6倍,20万亿Token训练
3 6 Ke· 2025-08-19 02:33
Core Insights - NVIDIA has launched a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to the industry benchmark Qwen3-8B, while maintaining or exceeding performance in complex reasoning tasks [1][23]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-2 architecture, which replaces most self-attention layers in traditional Transformer architectures, resulting in significant speed improvements during complex reasoning tasks [10][15]. - The model demonstrates competitive accuracy in various benchmarks, including mathematics, code generation, and general reasoning, performing on par or better than similar open-source models like Qwen3-8B and Gemma3-12B [23][24]. - In specific benchmarks, the model achieved notable scores, such as 97.8% in MATH500 and 72.1% in AIME25, showcasing its capabilities in mathematical reasoning and general knowledge [24]. Group 2: Training and Data Utilization - The training process for the Nemotron Nano 2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a foundational model with 120 billion parameters, which was later distilled to 9 billion parameters [17][22]. - The model's training included high-quality data from various sources, focusing on mathematics, code, and multilingual question-answering, ensuring a robust pre-training dataset [18][25]. - NVIDIA has also released a comprehensive pre-training dataset, Nemotron-Pre-Training-Dataset-v1, which includes 6.6 trillion tokens from diverse domains, further enhancing the model's training foundation [25][27]. Group 3: Open Source Commitment - NVIDIA has committed to open-sourcing the Nemotron models on the HuggingFace platform, providing access to the 9B model, its base version, and the larger 12B model, along with the associated datasets [25][30]. - This move reflects NVIDIA's ongoing efforts to contribute to the open-source community, contrasting with other companies that are shifting towards more closed-source strategies [27].
全面超越DiffusionDrive, GMF-Drive:全球首个Mamba端到端SOTA方案
理想TOP2· 2025-08-18 12:43
Core Insights - The article discusses the advancements in end-to-end autonomous driving, emphasizing the importance of multi-modal fusion architectures and the introduction of GMF-Drive as a new framework that improves upon existing methods [3][4][44]. Group 1: End-to-End Autonomous Driving - End-to-end autonomous driving has gained widespread acceptance as it directly maps raw sensor inputs to driving actions, reducing reliance on intermediate representations and information loss [3]. - Recent models like DiffusionDrive and GoalFlow demonstrate strong capabilities in generating diverse and high-quality driving trajectories [3]. Group 2: Multi-Modal Fusion Challenges - A key bottleneck in current systems is the integration of heterogeneous inputs from different sensors, with existing methods often relying on simple feature concatenation rather than structured information integration [4][6]. - The article highlights that current multi-modal fusion architectures, such as TransFuser, show limited performance improvements compared to single-modal architectures, indicating a need for more sophisticated integration methods [6]. Group 3: GMF-Drive Overview - GMF-Drive, developed by teams from University of Science and Technology of China and China University of Mining and Technology, includes three modules aimed at enhancing multi-modal fusion for autonomous driving [7]. - The framework combines a gated Mamba fusion approach with spatial-aware BEV representation, addressing the limitations of traditional transformer-based methods [7][44]. Group 4: Innovations in Data Representation - The article introduces a 14-dimensional pillar representation that retains critical 3D geometric features, enhancing the model's perception capabilities [16][19]. - This representation captures local surface geometry and height variations, allowing the model to differentiate between objects with similar point densities but different structures [19]. Group 5: GM-Fusion Module - The GM-Fusion module integrates multi-modal features through gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, achieving linear complexity while maintaining long-range dependency modeling [19][20]. - The module's design allows for effective spatial dependency modeling and improved feature alignment between camera and LiDAR data [19][40]. Group 6: Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM benchmark, outperforming the previous best model, DiffusionDrive, by 0.8 points, demonstrating the effectiveness of the GM-Fusion architecture [29][30]. - The framework also showed significant improvements in key sub-metrics, such as driving area compliance and vehicle progression rate, indicating enhanced safety and efficiency [30][31]. Group 7: Conclusion - The article concludes that GMF-Drive represents a significant advancement in autonomous driving frameworks by effectively combining geometric representations with spatially aware fusion techniques, achieving new performance benchmarks [44].