状态空间模型

Search documents
Meta没做的,英伟达做了!全新架构吞吐量狂飙6倍,20万亿Token训练
具身智能之心· 2025-08-20 00:03
Core Viewpoint - NVIDIA has released a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to its competitor Qwen3-8B, while maintaining comparable or superior performance in complex reasoning tasks [1][6][41]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-Transformer hybrid architecture, which enhances inference speed and accuracy [5][6]. - In complex reasoning benchmark tests, the model matches or exceeds the accuracy of Qwen3-8B, achieving a maximum throughput increase of 6 times [6][41]. - The Mamba architecture is designed for efficient modeling of long sequences, reportedly being 3-5 times faster than traditional Transformer models, with linear complexity supporting extremely long contexts [28][29]. Group 2: Training and Development Process - The training of Nemotron-Nano-9B-v2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a 12B parameter base model [32][34]. - The model underwent extreme compression and distillation processes, reducing the 12B parameter model to 9B while ensuring compatibility with a single A10G GPU for 128k context support [39][40]. - The training data included high-quality web pages, multilingual content, mathematics, and code, focusing on building a high-fidelity dataset for mathematical and coding tasks [34][38]. Group 3: Benchmarking and Open Source - The Nemotron-Nano-9B-v2 model has demonstrated superior or equivalent performance in various benchmarks, including mathematics, code generation, and general reasoning tasks [41][43]. - NVIDIA has announced the open-sourcing of several models and datasets on the HuggingFace platform, including the Nemotron-Pre-Training-Dataset-v1, which contains 6.6 trillion tokens of high-quality data [44]. - The open-source initiative aims to support robust multilingual reasoning and general knowledge pre-training, with a focus on high-quality mathematical content [44].
Meta没做的,英伟达做了,全新架构吞吐量狂飙6倍,20万亿Token训练
3 6 Ke· 2025-08-19 02:33
Core Insights - NVIDIA has launched a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to the industry benchmark Qwen3-8B, while maintaining or exceeding performance in complex reasoning tasks [1][23]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-2 architecture, which replaces most self-attention layers in traditional Transformer architectures, resulting in significant speed improvements during complex reasoning tasks [10][15]. - The model demonstrates competitive accuracy in various benchmarks, including mathematics, code generation, and general reasoning, performing on par or better than similar open-source models like Qwen3-8B and Gemma3-12B [23][24]. - In specific benchmarks, the model achieved notable scores, such as 97.8% in MATH500 and 72.1% in AIME25, showcasing its capabilities in mathematical reasoning and general knowledge [24]. Group 2: Training and Data Utilization - The training process for the Nemotron Nano 2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a foundational model with 120 billion parameters, which was later distilled to 9 billion parameters [17][22]. - The model's training included high-quality data from various sources, focusing on mathematics, code, and multilingual question-answering, ensuring a robust pre-training dataset [18][25]. - NVIDIA has also released a comprehensive pre-training dataset, Nemotron-Pre-Training-Dataset-v1, which includes 6.6 trillion tokens from diverse domains, further enhancing the model's training foundation [25][27]. Group 3: Open Source Commitment - NVIDIA has committed to open-sourcing the Nemotron models on the HuggingFace platform, providing access to the 9B model, its base version, and the larger 12B model, along with the associated datasets [25][30]. - This move reflects NVIDIA's ongoing efforts to contribute to the open-source community, contrasting with other companies that are shifting towards more closed-source strategies [27].
浙大MambaMap:基于状态空间模型的在线矢量高精地图构建
自动驾驶之心· 2025-08-04 23:33
作者 | 自动驾驶专栏 来源 | 自动驾驶专栏 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 摘要 本文介绍了 MambaMap:基于状态空间模型的在线矢量高精地图构建。高精(HD)地图对于自动驾驶至关重要,因为它们为下游任务提供了精确的道 路信息。最新的进展突出了时间建模在应对遮挡和延伸的感知范围等挑战方面的潜力。然而,现有的方法要么无法充分利用时间信息,要么在处理扩展序列 方面产生巨大的计算开销。为了应对这些挑战,本文提出了MambaMap,这是一种新型的框架,它能够高效地融合状态空间中的长距离时间特征,以构建 在线矢量高精地图。具体而言,MambaMap结合了记忆库来存储并且利用历史帧信息,动态地更新BEV特征和实例查询以提高对噪声和遮挡的鲁棒性。此 外,本文还在状态空间中引入了门控机制,以计算高效的方式选择性地集成地图元素的依赖关系。创新性地,本文设计了多向扫描策略和时空扫描策略,分 别在BEV级和实例级增强特征提取能力。这些策略显著提高了所提出方法的预测准 ...