后神经架构搜索 - filings, earnings calls, financial reports, news

后神经架构搜索

Search documents

机器之心· 2025-08-26 09:38

Core Insights - The article introduces Jet-Nemotron, a new hybrid architecture language model developed by researchers from NVIDIA, which achieves state-of-the-art (SOTA) accuracy while significantly improving efficiency compared to existing full-attention models [2][8][9]. Model Performance - Jet-Nemotron-2B outperforms several leading open-source full-attention models, including Qwen3, Qwen2.5, Gemma3, and Llama3.2, while achieving a throughput acceleration of up to 53.6 times on H100 GPUs with a context length of 256K and maximum batch size [2][9]. - In benchmark tests such as MMLU and MMLU-Pro, Jet-Nemotron's accuracy surpasses that of advanced MoE full-attention models, despite those models having larger parameter sizes [2][5]. Innovations and Techniques - Jet-Nemotron is built on two core innovations: Post Neural Architecture Search (PostNAS) and JetBlock, a new linear attention module that significantly enhances performance compared to previous designs like Mamba2 [6][21]. - PostNAS allows for efficient architecture exploration and adaptation on pre-trained Transformer models, reducing the cost and risk associated with developing new language model architectures [12][16]. Efficiency and Accuracy - The architecture of Jet-Nemotron enables immediate improvements in efficiency and accuracy, leading to better service quality and reduced operational costs [17]. - The hardware-aware search conducted by PostNAS identifies architectures that maintain similar throughput while achieving higher accuracy with more parameters [18]. Comparative Results - Jet-Nemotron-2B and Jet-Nemotron-4B demonstrate competitive accuracy against leading efficient language models, with Jet-Nemotron-4B being 21 times faster and Jet-Nemotron-2B being 47 times faster than Qwen3-1.7B-Base [23][24].

英伟达韩松团队新作：具有后神经架构搜索的高效语言模型

量子位· 2025-08-26 08:11

时令发自凹非寺量子位 | 公众号 QbitAI 英伟达开源又放大招了！韩松团队推出了一款全新的基于后神经架构搜索的高效语言模型—— Jet-Nemotron 。该模型在一系列基准测试中，不仅表现出与Qwen3、Qwen2.5、Gemma 3和Llama 3.2相当甚至更优的准确率，还在生成吞吐量上实现最高 53.6倍加速，在预填充阶段达到6.1倍加速。值得一提的是，在MMLU、MMLU-Pro和BBH基准上，Jet-Nemotron-2B相比Qwen3-1.7B-Base吞吐量提高了47倍，缓存大小缩小至1/47。同时，它还实现了比DeepSeek-V3-Small和Moonlight （共150亿参数，22亿激活参数）更高的准确率。代码和预训练模型都将开源，我们先来看看Jet-Nemotron是如何构建的。 Jet-Nemotron：基于后神经架构搜索构建首先，Jet-Nemotron是在后神经架构搜索（Post Neural Architecture Search，PostNAS）的基础上构建的。其中，后神经架构搜索（PostNAS）模型是一种"站在大模型肩膀上做改造"的架构搜 ...