推理效率 - filings, earnings calls, financial reports, news

推理效率

Search documents

公开模型一切，优于DeepSeek-R1，英伟达开源Llama-Nemotron家族

机器之心· 2025-05-06 08:04

Core Viewpoint - The rapid development of large models has made reasoning ability a key indicator of model intelligence, with inference efficiency becoming a critical limiting factor for model deployment and performance [2][3]. Group 1: Model Overview - NVIDIA has launched the Llama-Nemotron series, an open family of large models designed for efficient reasoning, featuring excellent inference capabilities and an enterprise-friendly open license [3][5]. - The series includes three model sizes: Nano (8B), Super (49B), and Ultra (253B), along with an independent variant UltraLong (8B) that supports long context [4][5]. - The models are the first open-source models to support dynamic inference switching, allowing users to toggle between standard chat mode and reasoning mode, enhancing interaction flexibility [6]. Group 2: Model Training and Optimization - The Llama-Nemotron models utilize a multi-stage post-training process to enhance performance on reasoning and non-reasoning tasks, employing supervised fine-tuning and reinforcement learning techniques [9]. - The Puzzle framework is used for efficient reasoning optimization, transforming large language models into hardware-efficient variants while maintaining performance [12][15]. - LN-Super and LN-Ultra models achieve significant throughput improvements, with LN-Super showing a 5x increase in inference throughput compared to Llama 3.3-70B-Instruct [19]. Group 3: Performance Metrics - LN-Ultra demonstrates superior performance in key benchmarks, achieving scores such as 88.1 in MMLU and 80.4 in MATH500, surpassing its predecessors [25][24]. - The models are designed to meet specific deployment constraints, such as supporting up to 3 million cached tokens in FP8 precision for LN-Ultra [21]. Group 4: Reinforcement Learning and Instruction Following - The models incorporate a "detailed thinking on/off" instruction mechanism to enhance flexibility in reasoning depth and response style, improving user interaction [27]. - LN-Ultra's performance is further enhanced through large-scale reinforcement learning, allowing it to exceed the capabilities of its teacher model [31][39]. - The training process for LN-Ultra involved approximately 140,000 H100 GPU hours, focusing on optimizing reasoning capabilities and instruction-following abilities [32][41].