Core Insights - Alibaba's Tongyi team released the next-generation foundational model architecture Qwen3-Next, which includes the open-sourced Qwen3-Next-80B-A3B series models [1] - The new model features two versions: an instruction model optimized for understanding and executing commands, and a reasoning model designed for multi-step reasoning and deep thinking [1] Summary by Categories Model Architecture - Qwen3-Next introduces significant improvements over the previous Qwen3 model, including a mixed attention mechanism, high sparsity MoE structure, and a series of training stability optimizations [1] - The model employs a multiple-token prediction mechanism (MTP) to enhance reasoning efficiency [1] Performance Metrics - The new model has a total of 80 billion parameters but activates only 3 billion, achieving performance comparable to the flagship Qwen3 model with 235 billion parameters [1] - Training costs for Qwen3-Next have decreased by over 90% compared to the denser Qwen3-32B model, with long-text reasoning throughput improved by more than 10 times [1] - Qwen3-Next supports ultra-long context processing of up to one million tokens [1] MoE Architecture - The high sparsity MoE architecture is a cutting-edge exploration for next-generation models, with Qwen3-Next achieving an activation ratio of 1:50, compared to the previous Qwen3 series' ratio of 1:16 [2]
阿里巴巴开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制