Core Viewpoint - The article discusses the launch of the next-generation large language model architecture, Qwen3-Next, by Alibaba's Tongyi team, highlighting its significant improvements in computational efficiency and performance compared to previous models [2][20]. Model Architecture and Innovations - Qwen3-Next features a total of 80 billion parameters, activating only 3 billion, achieving performance comparable to the 235 billion parameter Qwen 3 flagship model and surpassing Gemini-2.5-Flash-Thinking [2][20]. - The model is designed for future trends in context length scaling and total parameter scaling, incorporating various technical enhancements over the previous Qwen3 model, including a mixed attention mechanism and high sparsity MoE structure [5][11]. - The Gated DeltaNet and Gated Attention mechanisms improve efficiency in processing long contexts, with a 3:1 mix ratio yielding superior performance [9][10]. Training and Stability Enhancements - Qwen3-Next employs a high sparsity MoE architecture, activating only 3.7% of its parameters during inference, which maximizes resource utilization without sacrificing performance [11]. - The model includes design features to enhance training stability, such as the Zero-Centered RMSNorm and initialization normalization for MoE router parameters [12][13]. Performance Metrics - In terms of throughput, Qwen3-Next demonstrates significant advantages, achieving nearly seven times the throughput of Qwen3-32B during the prefill phase with a 4k token context length, and over ten times when exceeding 32k tokens [17][20]. - The model's performance in various evaluations, including programming and reasoning tasks, surpasses that of previous models, achieving high scores in mathematical reasoning assessments [21]. Availability and Deployment - Qwen3-Next has been made available on multiple third-party platforms, enhancing its accessibility for developers and researchers [24].
全新MoE架构!阿里开源Qwen3-Next,训练成本直降9成