Core Insights - Alibaba's Qwen3.5 has been officially launched, featuring the first model Qwen3.5-397B-A17B with open weights, showcasing superior performance in reasoning, programming, agent capabilities, and multimodal understanding [1] Group 1: Model Specifications - Qwen3.5-397B-A17B utilizes an innovative hybrid architecture combining Gated Delta Networks and Mixture of Experts (MoE), achieving a total parameter count of 397 billion, with only 17 billion parameters activated per forward pass, optimizing speed and cost while maintaining capability [1] - The language and dialect support has been expanded from 119 to 201 languages, enhancing global usability and support [1] Group 2: Performance Enhancements - The performance improvements of Qwen3.5 over the Qwen3 series are primarily attributed to the comprehensive expansion of various Reinforcement Learning (RL) tasks and environments, focusing on the difficulty and generalizability of RL environments rather than optimizing for specific metrics or narrow categories [1] - Qwen3.5 achieves efficient native multimodal training through heterogeneous infrastructure, decoupling visual and language components to avoid inefficiencies associated with unified solutions [2] - Sparse activation allows for overlapping computations across modules, achieving nearly 100% training throughput on mixed text-image-video data compared to pure text baselines [2] - The native FP8 pipeline employs low precision for activations, MoE routing, and GEMM operations, resulting in approximately 50% reduction in activation memory and over 10% acceleration, while maintaining stability for trillions of tokens [2]
阿里千问推出原生视觉-语言模型Qwen3.5-397B-A17B