JetBlock

Search documents
马斯克新模型背后算法来自英伟达???
量子位· 2025-09-25 23:54
Core Viewpoint - Grok-4-fast has demonstrated exceptional performance in cost reduction and efficiency, surpassing even GPT-5, which is associated with routing capabilities [1][38]. Group 1: Performance and Efficiency - Grok-4-fast's impressive reasoning efficiency is attributed to advanced scaling of computational power [2]. - The underlying technology of Grok is linked to NVIDIA's algorithmic advancements, particularly a new model called Jet-Nemotron [3][4]. - Jet-Nemotron-2B has shown performance comparable to leading open-source models while achieving a speed increase of approximately 53 times [7]. Group 2: Technological Innovations - The key innovation behind Grok-4-fast is a new framework called PostNAS, which significantly reduces training costs and allows for more comprehensive exploration of model structures [10][11]. - PostNAS employs a hybrid structure model that retains essential attention layers while eliminating redundant ones to enhance efficiency [13][14]. - The framework includes four core components: full attention layer placement, optimal linear attention module selection, design of superior linear attention modules, and hardware-aware architecture search [12]. Group 3: Attention Mechanisms - The NVIDIA team evaluated six advanced linear attention modules, with Gated DeltaNet achieving the highest accuracy due to its data-dependent gating mechanism and delta rule [18][19]. - JetBlock, a more advanced linear attention module, utilizes dynamic convolution to adaptively generate convolution kernels based on input features, outperforming Gated DeltaNet in accuracy for mathematical reasoning and retrieval tasks [21][24]. Group 4: Hardware Optimization - NVIDIA's hardware-aware architecture search focuses on optimizing key parameters rather than solely relying on parameter size, which does not accurately reflect real hardware efficiency [27][28]. - The team found that the size of the key-value (KV) cache is crucial for throughput in long-context and long-text generation, leading to a targeted optimization approach [30][31]. Group 5: Industry Impact - PostNAS is expected to influence the AI industry by providing a low-cost, high-efficiency architecture exploration method applicable to any pre-trained transformer [34]. - The Jet-Nemotron model is open-source, allowing various manufacturers to integrate it without retraining, significantly reducing costs while maintaining accuracy [36][42]. - The potential application of Jet-Nemotron across major AI companies like OpenAI and Google could lead to widespread improvements in model performance and cost efficiency [43].