Workflow
PortNAS
icon
Search documents
马斯克新模型背后算法来自英伟达???
Sou Hu Cai Jing· 2025-09-26 00:19
Core Insights - Grok-4-fast has demonstrated exceptional performance in cost reduction and efficiency, outperforming GPT-5 in reasoning efficiency, attributed to advancements in algorithm rather than just hardware scaling [1][22] - The underlying technology, Jet-Nemotron, developed by NVIDIA, addresses long-standing issues in inference costs, achieving a speed improvement of approximately 53 times compared to leading open-source models [1][3] Algorithmic Innovations - Jet-Nemotron-2B shows superior performance in accuracy and speed, achieving 47 times faster generation than Qwen3-1.7B-Base on MMLU-Pro [3] - The key to these advancements lies in a new framework called PortNAS, which optimizes model training by starting from a pre-trained model and focusing on improving attention mechanisms [4][10] - PortNAS employs a strategy of selectively placing full attention layers, which conserves computational resources while maintaining accuracy [4][14] Model Architecture - The framework includes four core components: full attention layer placement, optimal linear attention module selection, design of superior linear attention modules, and hardware-aware architecture search [4][10] - The Gated DeltaNet module was identified as the most accurate among six evaluated linear attention modules, leading to the development of an even more advanced module, JetBlock [5][8] Performance Metrics - JetBlock utilizes dynamic convolution to enhance accuracy in mathematical reasoning and retrieval tasks while maintaining generation efficiency [7][8] - The architecture search process focuses on optimizing key parameters based on throughput rather than just parameter count, leading to improved performance metrics [10][12] Industry Impact - PortNAS is expected to significantly reduce GPU usage time during inference by 47 times, lower memory requirements for hardware deployment, and increase throughput, allowing model vendors to serve more users [14][15][16] - The open-source nature of Jet-Nemotron allows various vendors to integrate the technology without retraining models, potentially reducing costs significantly while maintaining accuracy [18][19] Research Contributions - The research behind Jet-Nemotron is led by a team of Chinese scholars, with the first author being Gu Yuxian from Tsinghua University, focusing on enhancing efficiency across the lifecycle of large language models (LLMs) [25][27] - The work emphasizes the importance of efficient model architecture design and knowledge distillation techniques for language model compression [25]