Core Insights - Alibaba's Tongyi team released the next-generation foundational model architecture Qwen3-Next on September 12, featuring the open-source Qwen3-Next-80B-A3B series models [1] - The new model includes two versions: an instruction model that excels in understanding and executing commands, and a reasoning model that is better at multi-step reasoning and deep thinking [1] Model Improvements - Qwen3-Next introduces significant enhancements over the previous Qwen3 model, including a mixed attention mechanism, a high sparsity MoE structure, a series of training stability optimizations, and a multiple-token prediction mechanism (MTP) that improves reasoning efficiency [1] - The new model has a total of 80 billion parameters but activates only 3 billion, achieving performance comparable to the flagship Qwen3 model with 235 billion parameters, while significantly improving computational efficiency [1] Cost and Performance Metrics - Training costs for Qwen3-Next have decreased by over 90% compared to the denser Qwen3-32B model, with long-text reasoning throughput increasing by more than ten times [1] - The model supports ultra-long context processing of up to one million tokens, enhancing its capability for handling extensive text [1] MoE Architecture - The high sparsity MoE architecture represents the latest exploration for next-generation models, with Qwen3-Next achieving an activation ratio of 1:50, compared to the previous Qwen3 series' ratio of approximately 1:16 [2]
阿里巴巴(09988)开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制