低激活开销模型

Search documents
阿里深夜干了件大事,成本暴降90%
3 6 Ke· 2025-09-12 02:45
Core Insights - Alibaba's Tongyi Lab has officially released the next-generation foundational model architecture Qwen3-Next, which includes the Qwen3-Next-80B-A3B-Base model with 80 billion parameters, activating only 3 billion parameters [1][21] - The new model architecture is designed to enhance performance while significantly reducing training costs, achieving over 10 times the inference throughput compared to the previous Qwen3-32B model under long context scenarios [1][8][21] Model Performance - The instruction model of Qwen3-Next-80B-A3B performs comparably to the larger Qwen3-235B-A22B-Instruct-2507 model, while the thinking model outperforms Google's closed-source model Gemini-2.5-Flash-Thinking [2][12] - In various benchmark tests, Qwen3-Next-80B-A3B-Base shows performance similar to Qwen3-32B-Base, but with training costs less than 10% of Qwen3-32B-Base [6][21] Architectural Innovations - Qwen3-Next introduces several architectural innovations, including a hybrid attention mechanism, high sparsity MoE structure, and a multi-token prediction (MTP) mechanism, which collectively enhance inference efficiency and model stability [5][16][19] - The hybrid attention mechanism combines Gated DeltaNet and Gated Attention to improve context modeling for long sequences, achieving a 1:50 activation ratio in MoE layers, significantly reducing FLOPS per token [18][19] Training Efficiency - The model utilizes a subset of 15 trillion tokens from the Qwen3 36T pre-training corpus, requiring only 9.3% of the GPU resources compared to Qwen3-32B, while delivering superior performance [16][21] - The MTP mechanism optimizes multi-step inference performance, enhancing the acceptance rate of speculative decoding in practical applications [19] Future Developments - Alibaba plans to continue optimizing the Qwen3-Next architecture and is developing Qwen3.5, alongside launching various models across different domains, thereby increasing its technical influence in the open-source community [21]