Workflow
Llama Nemotron Super v1.5
icon
Search documents
英伟达全新开源模型:三倍吞吐、单卡可跑,还拿下推理SOTA
量子位· 2025-07-29 05:05
Core Viewpoint - NVIDIA has launched the Llama Nemotron Super v1.5, an open-source model designed for complex reasoning and agent tasks, achieving state-of-the-art performance while tripling throughput compared to its predecessor, and efficiently running on a single GPU [2][11]. Model Introduction - Llama Nemotron Super v1.5 is an upgraded version of Llama-3.3-Nemotron-Super-49B-V1, specifically tailored for complex reasoning and intelligent agent tasks [3]. Model Architecture - The model employs Neural Architecture Search (NAS) to balance accuracy and efficiency, effectively converting throughput improvements into lower operational costs [4]. - NAS generates non-standard, non-repetitive network modules, introducing two key changes compared to traditional Transformers: - Skip attention mechanism, which bypasses the attention layer in certain modules [6]. - Variable Feedforward Network (FFN), where different modules utilize varying expansion/compression ratios [7]. Efficiency Improvements - The model reduces FLOPs by skipping attention or altering FFN widths, allowing for more efficient operation under resource constraints [8]. - A block-wise distillation process was applied to the original Llama model, constructing multiple variants for each module and searching for optimal combinations [9]. Training and Dataset - The model was trained on 40 billion tokens from three datasets: FineWeb, Buzz-V1.2, and Dolma, focusing on English single-turn and multi-turn conversations [10]. - Post-training involved a combination of supervised fine-tuning and reinforcement learning to enhance performance in key tasks such as coding, mathematics, reasoning, and instruction following [10]. Deployment and Ecosystem - NVIDIA's AI models are optimized for running on NVIDIA GPU-accelerated systems, achieving significant speed improvements over CPU-only solutions [12]. - Llama Nemotron Super v1.5 is now open-source, available for developers on build.nvidia.com or via Hugging Face [13]. Ecosystem and Model Series - The Llama Nemotron ecosystem integrates large language models, training and inference frameworks, optimization tools, and enterprise deployment solutions for high-performance AI application development [14]. - NVIDIA has introduced three series of large language models: Nano, Super, and Ultra, catering to different deployment needs and user profiles [16]. - The Super series, including Llama Nemotron Super v1.5, balances precision and computational efficiency for single GPU use [17]. Enterprise Support - The Nemotron model has gained support from major enterprises like SAP, Microsoft, and Deloitte for building AI agent platforms aimed at enterprise-level process automation and complex problem-solving [17].