大模型架构
Search documents
阿里巴巴(09988)开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制
智通财经网· 2025-09-12 06:13
Core Insights - Alibaba's Tongyi team released the next-generation foundational model architecture Qwen3-Next on September 12, featuring the open-source Qwen3-Next-80B-A3B series models [1] - The new model includes two versions: an instruction model that excels in understanding and executing commands, and a reasoning model that is better at multi-step reasoning and deep thinking [1] Model Improvements - Qwen3-Next introduces significant enhancements over the previous Qwen3 model, including a mixed attention mechanism, a high sparsity MoE structure, a series of training stability optimizations, and a multiple-token prediction mechanism (MTP) that improves reasoning efficiency [1] - The new model has a total of 80 billion parameters but activates only 3 billion, achieving performance comparable to the flagship Qwen3 model with 235 billion parameters, while significantly improving computational efficiency [1] Cost and Performance Metrics - Training costs for Qwen3-Next have decreased by over 90% compared to the denser Qwen3-32B model, with long-text reasoning throughput increasing by more than ten times [1] - The model supports ultra-long context processing of up to one million tokens, enhancing its capability for handling extensive text [1] MoE Architecture - The high sparsity MoE architecture represents the latest exploration for next-generation models, with Qwen3-Next achieving an activation ratio of 1:50, compared to the previous Qwen3 series' ratio of approximately 1:16 [2]
阿里巴巴开源新架构Qwen3-Next 训练成本大幅下降 引入混合注意力机制
Zhi Tong Cai Jing· 2025-09-12 06:12
9月12日,阿里巴巴(09988)通义发布下一代基础模型架构Qwen3-Next,并开源了基于该架构的Qwen3- Next-80B-A3B系列模型。该模型包含两个版本:更擅长理解和执行指令的指令(Insctruct)模型,以及更 擅长多步推理和深度思考的推理(Thinking)模型。 据介绍,相比Qwen3的MoE(混合专家)模型结构,Qwen3-Next进行了以下核心改进:混合注意力机制、 高稀疏度MoE结构、一系列训练稳定友好的优化,以及提升推理效率的多token预测机制(简称MTP, Multiple-Token Prediction)。 具体表现方面,新模型总参数80B仅激活3B,性能可媲美千问3旗舰版235B模型,模型计算效率大幅提 升。Qwen3-Next训练成本较密集模型Qwen3-32B大降超90%,长文本推理吞吐量提升10倍以上,并可支 持百万Tokens(文本处理的最小单位)超长上下文。 阿里巴巴通义团队指出,高稀疏MoE架构是Qwen3-Next面向下一代模型的最新探索。当前,MoE是主 流大模型都采用的架构,通过激活大参数中的小部分专家完成推理任务。此前,Qwen3系列的MoE专家 ...
百度2026届校招重注AI,超4000份Offer,应届生直接触核心研发!
Sou Hu Cai Jing· 2025-07-12 00:03
Group 1: Core Insights - Baidu has launched its 2026 campus recruitment with an unprecedented scale, offering over 4,000 job positions, with 90% related to AI, highlighting the company's focus on AI talent [1] - The recruitment spans seven major cities, including Beijing, Shanghai, Shenzhen, and Chengdu, and introduces 90 new positions in AI, focusing on cutting-edge technologies such as multimodal and large model architectures [1] - Graduates will have the opportunity to work on core products like Baidu's Wenxin large model, PaddlePaddle platform, and digital human projects, providing a significant career starting point [1] Group 2: AI Job Categories - The AI positions cover four core areas: computing power, framework, model, and application layers, aiming to build a robust computational foundation and support model and application development [3] - Positions include AI heterogeneous computing, cloud-native AI, deep learning, and algorithm engineers, emphasizing the development of intelligent systems [3] - Innovative roles like "AI large model evaluation product manager" require a blend of technical expertise and business understanding, particularly in designing AI recommendation systems that protect consumer privacy [3] Group 3: Industry Context - The competition among internet giants in the AI sector is intensifying, with Baidu demonstrating strong performance in the intelligent cloud market, winning 48 bidding projects worth 510 million yuan in the first half of 2025 [5] - Baidu has established a computing power foundation with 30,000 Kunlun chip clusters, providing efficient infrastructure support to enterprises like China Merchants Bank, enhancing application effectiveness in various scenarios [5] - Alibaba Cloud has also achieved significant results in AI, with annual revenue reaching 118 billion yuan in the 2025 fiscal year and AI-related products experiencing triple-digit growth for seven consecutive quarters [5]
华泰证券:算力链高景气延续,下半年AI眼镜有望迎来拐点
news flash· 2025-07-02 00:01
Group 1 - The report from Huatai Securities suggests that the electronic sector is expected to maintain high prosperity due to the continuous iteration of large model architectures and the potential acceleration of inference demand driven by Scaling Law [1] - In terms of self-controllability, the domestic manufacturing sector is advancing in terms of advanced process capacity, presenting opportunities for domestic equipment manufacturers as new capacities continue to emerge, leading to an increase in localization rates [1] - On the AI front, AI glasses are anticipated to reach a turning point in the second half of the year, while the smart driving sector is expected to accelerate its industrial trend due to continuous price reductions [1]