Workflow
大模型架构
icon
Search documents
阿里巴巴(09988)开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制
智通财经网· 2025-09-12 06:13
Core Insights - Alibaba's Tongyi team released the next-generation foundational model architecture Qwen3-Next on September 12, featuring the open-source Qwen3-Next-80B-A3B series models [1] - The new model includes two versions: an instruction model that excels in understanding and executing commands, and a reasoning model that is better at multi-step reasoning and deep thinking [1] Model Improvements - Qwen3-Next introduces significant enhancements over the previous Qwen3 model, including a mixed attention mechanism, a high sparsity MoE structure, a series of training stability optimizations, and a multiple-token prediction mechanism (MTP) that improves reasoning efficiency [1] - The new model has a total of 80 billion parameters but activates only 3 billion, achieving performance comparable to the flagship Qwen3 model with 235 billion parameters, while significantly improving computational efficiency [1] Cost and Performance Metrics - Training costs for Qwen3-Next have decreased by over 90% compared to the denser Qwen3-32B model, with long-text reasoning throughput increasing by more than ten times [1] - The model supports ultra-long context processing of up to one million tokens, enhancing its capability for handling extensive text [1] MoE Architecture - The high sparsity MoE architecture represents the latest exploration for next-generation models, with Qwen3-Next achieving an activation ratio of 1:50, compared to the previous Qwen3 series' ratio of approximately 1:16 [2]
阿里巴巴开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制
Zhi Tong Cai Jing· 2025-09-12 06:12
Core Insights - Alibaba's Tongyi team released the next-generation foundational model architecture Qwen3-Next, which includes the open-sourced Qwen3-Next-80B-A3B series models [1] - The new model features two versions: an instruction model optimized for understanding and executing commands, and a reasoning model designed for multi-step reasoning and deep thinking [1] Summary by Categories Model Architecture - Qwen3-Next introduces significant improvements over the previous Qwen3 model, including a mixed attention mechanism, high sparsity MoE structure, and a series of training stability optimizations [1] - The model employs a multiple-token prediction mechanism (MTP) to enhance reasoning efficiency [1] Performance Metrics - The new model has a total of 80 billion parameters but activates only 3 billion, achieving performance comparable to the flagship Qwen3 model with 235 billion parameters [1] - Training costs for Qwen3-Next have decreased by over 90% compared to the denser Qwen3-32B model, with long-text reasoning throughput improved by more than 10 times [1] - Qwen3-Next supports ultra-long context processing of up to one million tokens [1] MoE Architecture - The high sparsity MoE architecture is a cutting-edge exploration for next-generation models, with Qwen3-Next achieving an activation ratio of 1:50, compared to the previous Qwen3 series' ratio of 1:16 [2]
百度2026届校招重注AI,超4000份Offer,应届生直接触核心研发!
Sou Hu Cai Jing· 2025-07-12 00:03
Group 1: Core Insights - Baidu has launched its 2026 campus recruitment with an unprecedented scale, offering over 4,000 job positions, with 90% related to AI, highlighting the company's focus on AI talent [1] - The recruitment spans seven major cities, including Beijing, Shanghai, Shenzhen, and Chengdu, and introduces 90 new positions in AI, focusing on cutting-edge technologies such as multimodal and large model architectures [1] - Graduates will have the opportunity to work on core products like Baidu's Wenxin large model, PaddlePaddle platform, and digital human projects, providing a significant career starting point [1] Group 2: AI Job Categories - The AI positions cover four core areas: computing power, framework, model, and application layers, aiming to build a robust computational foundation and support model and application development [3] - Positions include AI heterogeneous computing, cloud-native AI, deep learning, and algorithm engineers, emphasizing the development of intelligent systems [3] - Innovative roles like "AI large model evaluation product manager" require a blend of technical expertise and business understanding, particularly in designing AI recommendation systems that protect consumer privacy [3] Group 3: Industry Context - The competition among internet giants in the AI sector is intensifying, with Baidu demonstrating strong performance in the intelligent cloud market, winning 48 bidding projects worth 510 million yuan in the first half of 2025 [5] - Baidu has established a computing power foundation with 30,000 Kunlun chip clusters, providing efficient infrastructure support to enterprises like China Merchants Bank, enhancing application effectiveness in various scenarios [5] - Alibaba Cloud has also achieved significant results in AI, with annual revenue reaching 118 billion yuan in the 2025 fiscal year and AI-related products experiencing triple-digit growth for seven consecutive quarters [5]
华泰证券:算力链高景气延续,下半年AI眼镜有望迎来拐点
news flash· 2025-07-02 00:01
Group 1 - The report from Huatai Securities suggests that the electronic sector is expected to maintain high prosperity due to the continuous iteration of large model architectures and the potential acceleration of inference demand driven by Scaling Law [1] - In terms of self-controllability, the domestic manufacturing sector is advancing in terms of advanced process capacity, presenting opportunities for domestic equipment manufacturers as new capacities continue to emerge, leading to an increase in localization rates [1] - On the AI front, AI glasses are anticipated to reach a turning point in the second half of the year, while the smart driving sector is expected to accelerate its industrial trend due to continuous price reductions [1]