Workflow
状态空间模型
icon
Search documents
Meta没做的,英伟达做了!全新架构吞吐量狂飙6倍,20万亿Token训练
具身智能之心· 2025-08-20 00:03
Core Viewpoint - NVIDIA has released a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to its competitor Qwen3-8B, while maintaining comparable or superior performance in complex reasoning tasks [1][6][41]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-Transformer hybrid architecture, which enhances inference speed and accuracy [5][6]. - In complex reasoning benchmark tests, the model matches or exceeds the accuracy of Qwen3-8B, achieving a maximum throughput increase of 6 times [6][41]. - The Mamba architecture is designed for efficient modeling of long sequences, reportedly being 3-5 times faster than traditional Transformer models, with linear complexity supporting extremely long contexts [28][29]. Group 2: Training and Development Process - The training of Nemotron-Nano-9B-v2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a 12B parameter base model [32][34]. - The model underwent extreme compression and distillation processes, reducing the 12B parameter model to 9B while ensuring compatibility with a single A10G GPU for 128k context support [39][40]. - The training data included high-quality web pages, multilingual content, mathematics, and code, focusing on building a high-fidelity dataset for mathematical and coding tasks [34][38]. Group 3: Benchmarking and Open Source - The Nemotron-Nano-9B-v2 model has demonstrated superior or equivalent performance in various benchmarks, including mathematics, code generation, and general reasoning tasks [41][43]. - NVIDIA has announced the open-sourcing of several models and datasets on the HuggingFace platform, including the Nemotron-Pre-Training-Dataset-v1, which contains 6.6 trillion tokens of high-quality data [44]. - The open-source initiative aims to support robust multilingual reasoning and general knowledge pre-training, with a focus on high-quality mathematical content [44].
Meta没做的,英伟达做了,全新架构吞吐量狂飙6倍,20万亿Token训练
3 6 Ke· 2025-08-19 02:33
Core Insights - NVIDIA has launched a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to the industry benchmark Qwen3-8B, while maintaining or exceeding performance in complex reasoning tasks [1][23]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-2 architecture, which replaces most self-attention layers in traditional Transformer architectures, resulting in significant speed improvements during complex reasoning tasks [10][15]. - The model demonstrates competitive accuracy in various benchmarks, including mathematics, code generation, and general reasoning, performing on par or better than similar open-source models like Qwen3-8B and Gemma3-12B [23][24]. - In specific benchmarks, the model achieved notable scores, such as 97.8% in MATH500 and 72.1% in AIME25, showcasing its capabilities in mathematical reasoning and general knowledge [24]. Group 2: Training and Data Utilization - The training process for the Nemotron Nano 2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a foundational model with 120 billion parameters, which was later distilled to 9 billion parameters [17][22]. - The model's training included high-quality data from various sources, focusing on mathematics, code, and multilingual question-answering, ensuring a robust pre-training dataset [18][25]. - NVIDIA has also released a comprehensive pre-training dataset, Nemotron-Pre-Training-Dataset-v1, which includes 6.6 trillion tokens from diverse domains, further enhancing the model's training foundation [25][27]. Group 3: Open Source Commitment - NVIDIA has committed to open-sourcing the Nemotron models on the HuggingFace platform, providing access to the 9B model, its base version, and the larger 12B model, along with the associated datasets [25][30]. - This move reflects NVIDIA's ongoing efforts to contribute to the open-source community, contrasting with other companies that are shifting towards more closed-source strategies [27].
浙大MambaMap:基于状态空间模型的在线矢量高精地图构建
自动驾驶之心· 2025-08-04 23:33
Core Insights - The article introduces MambaMap, a novel framework for online vector high-definition map construction based on state space models, which is crucial for autonomous driving as it provides precise road information for downstream tasks [4][5]. Summary by Sections Key Contributions - MambaMap framework efficiently integrates long-range temporal information for online vector high-definition map construction using state space models [5]. - An effective gating mechanism is introduced in the state space for efficient information selection and integration at both BEV feature and instance query levels, along with various scanning strategies to leverage spatiotemporal dependencies [5]. - Extensive experiments on nuScenes and Argoverse2 datasets demonstrate that MambaMap outperforms state-of-the-art methods across various settings [5]. Experimental Results - In the nuScenes dataset, MambaMap achieved an average precision (mAP) of 40.1, outperforming other methods like StreamMapNet and SQD-MapNet [12]. - For the Argoverse2 dataset, MambaMap also showed superior performance with a mAP of 61.0, indicating its robustness and generalization capabilities [12]. - The article presents detailed performance metrics across different methods and datasets, highlighting MambaMap's advantages in various scenarios [11][12]. Methodology - MambaMap utilizes a dynamic memory mechanism and a gating state space model to efficiently fuse BEV features and instance-level features over multiple time steps, capturing long-range dependencies with minimal computational overhead [18]. - The introduction of multi-directional and spatiotemporal scanning strategies enhances feature extraction capabilities and temporal consistency [18]. Future Directions - Future work aims to extend MambaMap to address other BEV perception tasks, such as 3D object detection and motion prediction, thereby broadening its applicability in robotics [18].