NVIDIA Nemotron Nano 2模型

Search documents
英伟达新模型上线,4B推理狂飙53倍,全新注意力架构超越Mamba 2
3 6 Ke· 2025-08-27 02:03
Core Insights - Nvidia has launched a new series of small models called Jet-Nemotron, developed by an all-Chinese team, featuring innovations such as Post Neural Architecture Search (PostNAS) and a new linear attention module called JetBlock [1][2][8] - Jet-Nemotron models (2B and 4B) outperform leading open-source models like Qwen3, Gemma3, and Llama3.2 in various dimensions including math, code, commonsense, retrieval, and long context accuracy [2][20] - The inference throughput on H100 GPUs has been significantly enhanced, achieving up to a 53.6 times increase [4][20] Model Performance - Jet-Nemotron-2B and Jet-Nemotron-4B demonstrate superior performance in benchmark tests, with Jet-Nemotron-4B achieving a 65.2% accuracy in MMLU, compared to Qwen3's 60.3% [21] - In long context scenarios, Jet-Nemotron shows a dramatic throughput increase, reaching up to 50 times improvement over Qwen3-1.7B [5][20] - The models also exhibit faster speeds, with Jet-Nemotron-2B being 21 times faster and Jet-Nemotron-4B 47 times faster than Qwen3-1.7B-Base [20] Innovations - PostNAS allows for efficient architecture exploration and adaptation based on pre-trained Transformer models, significantly reducing the cost and risk of developing new language model architectures [9][10][14] - JetBlock, a new linear attention module, combines dynamic convolution with hardware-aware architecture search, leading to substantial accuracy improvements while maintaining similar training and inference throughput as previous designs [18][20] Technical Specifications - Jet-Nemotron models have been optimized for various parameters, including cache size and throughput, with configurations achieving a maximum throughput of 2,885 tokens per second [21] - The models utilize a flexible design for attention blocks, allowing for improved performance in long context and complex reasoning tasks [16][18]
Meta没做的,英伟达做了!全新架构吞吐量狂飙6倍,20万亿Token训练
具身智能之心· 2025-08-20 00:03
Core Viewpoint - NVIDIA has released a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to its competitor Qwen3-8B, while maintaining comparable or superior performance in complex reasoning tasks [1][6][41]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-Transformer hybrid architecture, which enhances inference speed and accuracy [5][6]. - In complex reasoning benchmark tests, the model matches or exceeds the accuracy of Qwen3-8B, achieving a maximum throughput increase of 6 times [6][41]. - The Mamba architecture is designed for efficient modeling of long sequences, reportedly being 3-5 times faster than traditional Transformer models, with linear complexity supporting extremely long contexts [28][29]. Group 2: Training and Development Process - The training of Nemotron-Nano-9B-v2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a 12B parameter base model [32][34]. - The model underwent extreme compression and distillation processes, reducing the 12B parameter model to 9B while ensuring compatibility with a single A10G GPU for 128k context support [39][40]. - The training data included high-quality web pages, multilingual content, mathematics, and code, focusing on building a high-fidelity dataset for mathematical and coding tasks [34][38]. Group 3: Benchmarking and Open Source - The Nemotron-Nano-9B-v2 model has demonstrated superior or equivalent performance in various benchmarks, including mathematics, code generation, and general reasoning tasks [41][43]. - NVIDIA has announced the open-sourcing of several models and datasets on the HuggingFace platform, including the Nemotron-Pre-Training-Dataset-v1, which contains 6.6 trillion tokens of high-quality data [44]. - The open-source initiative aims to support robust multilingual reasoning and general knowledge pre-training, with a focus on high-quality mathematical content [44].
Meta没做的,英伟达做了,全新架构吞吐量狂飙6倍,20万亿Token训练
3 6 Ke· 2025-08-19 02:33
Core Insights - NVIDIA has launched a new 9B model, the NVIDIA Nemotron Nano 2, utilizing a revolutionary Mamba-Transformer hybrid architecture that achieves up to 6 times higher inference throughput compared to the industry benchmark Qwen3-8B, while maintaining or exceeding performance in complex reasoning tasks [1][23]. Group 1: Model Architecture and Performance - The Nemotron Nano 2 model is based on the innovative Mamba-2 architecture, which replaces most self-attention layers in traditional Transformer architectures, resulting in significant speed improvements during complex reasoning tasks [10][15]. - The model demonstrates competitive accuracy in various benchmarks, including mathematics, code generation, and general reasoning, performing on par or better than similar open-source models like Qwen3-8B and Gemma3-12B [23][24]. - In specific benchmarks, the model achieved notable scores, such as 97.8% in MATH500 and 72.1% in AIME25, showcasing its capabilities in mathematical reasoning and general knowledge [24]. Group 2: Training and Data Utilization - The training process for the Nemotron Nano 2 involved a massive dataset of 20 trillion tokens, utilizing advanced FP8 training techniques to create a foundational model with 120 billion parameters, which was later distilled to 9 billion parameters [17][22]. - The model's training included high-quality data from various sources, focusing on mathematics, code, and multilingual question-answering, ensuring a robust pre-training dataset [18][25]. - NVIDIA has also released a comprehensive pre-training dataset, Nemotron-Pre-Training-Dataset-v1, which includes 6.6 trillion tokens from diverse domains, further enhancing the model's training foundation [25][27]. Group 3: Open Source Commitment - NVIDIA has committed to open-sourcing the Nemotron models on the HuggingFace platform, providing access to the 9B model, its base version, and the larger 12B model, along with the associated datasets [25][30]. - This move reflects NVIDIA's ongoing efforts to contribute to the open-source community, contrasting with other companies that are shifting towards more closed-source strategies [27].