Workflow
混合专家(MoE)架构
icon
Search documents
AI 上游涨价延续,中国大模型调用量首超美国
SINOLINK SECURITIES· 2026-03-05 00:45
产业研究中心 2026 年 3 月 3 日 AI 上 游 涨 价 延 产业研究周报 | 科技 SAC:S1130522050001 mengcan@gjzq.com.cn 孟灿 李忠宇 SAC:S1130524100002 lizhongyu01@gjzq.com.cn 1 续,中国大模型 调用量首超美国 N | 证券研究报告 | 核心要点 产业前沿 美国星际之门计划进展缓慢,全球高端 GPU 供给紧张,中美云计算厂商涨价, 苹果再次高价采购存储,透露出 NAND 芯片与 DRAM 市场供给严重短缺,SMIC 计 划扩产,OpenAI 融资 1100 亿美元,与亚马逊签署战略合作协议,并通过英伟达 确保获得下一代推理计算能力,26 年 2 月 9 日至 15 日当周,中国模型的调用量 首次超过美国模型,DeepSeek 在即将进行重大模型更新之前,未向美国芯片制 造商展示其即将推出的旗舰模型,Anthropic 推出智能体 AI 工具。 资本风向 英伟达 25Q4 财报良好,NVIDIA GTC 2026 大会定于 3 月 15 日在加州圣何塞开幕, AMD 与 Meta 签署多年协议,Meta 与谷歌达成协 ...
中文大模型基准测评2025年年度报告-SuperCLUE
Sou Hu Cai Jing· 2026-02-05 07:35
Core Insights - The Chinese large model sector is experiencing accelerated development in 2025, with the SuperCLUE annual evaluation covering 23 representative models from both domestic and international sources, focusing on general capabilities, specialized tasks, and application scenarios [1][2]. Group 1: Model Performance - The top-ranking closed-source model is Anthropic's Claude-Opus-4.5-Reasoning, scoring 68.25, followed by Google Gemini-3-Pro-Preview and OpenAI GPT-5.2 (high) [1][23]. - Domestic models are transitioning from "catching up" to "running alongside," with Kimi-K2.5-Thinking (61.50) and Qwen3-Max-Thinking (60.61) ranking fourth and sixth globally, excelling in code generation and mathematical reasoning tasks [1][2][23]. - The performance gap in precise instruction adherence and hallucination control remains significant, with average score differences exceeding 7 points and nearly 2 points, respectively [2]. Group 2: Technological Evolution - The evolution of technology is characterized by three stages: early competition among numerous models and the emergence of multimodal capabilities, a mid-stage explosion of multimodal applications and reasoning breakthroughs, and the rise of intelligent agents and ecosystem reconstruction by 2025 [1][2]. - The mixed expert (MoE) architecture has become mainstream, with domestic open-source models capturing a significant share of the global market, led by DeepSeek and Qwen3 [1][2]. Group 3: Application and Cost-Effectiveness - In application scenarios, general intelligent agents are still in their foundational stages, lacking in complex task handling capabilities; however, domestic models excel in multimodal areas such as image-to-video generation and Chinese adaptation [2]. - Domestic models demonstrate significant cost-effectiveness, with Kimi-K2.5-Thinking priced at only one-third of similar overseas models, although overseas models outperform in reasoning efficiency [2]. Group 4: Future Directions - The Chinese large model sector has made significant advancements in technological innovation, application deployment, and ecosystem construction, establishing core competitive advantages in open-source ecosystems, vertical applications, and cost-effectiveness [2]. - Future efforts should focus on overcoming shortcomings in precise instruction adherence and hallucination control to drive technology towards more efficient and reliable outcomes [2].
豆包日活破亿,接下来应该就要“搞钱”了
Sou Hu Cai Jing· 2025-12-27 19:41
Core Insights - The domestic AI product "Doubao" has achieved over 100 million daily active users, marking a significant milestone in its growth and influence in the market [1][3] - Doubao's user growth has been accompanied by relatively low user growth and marketing costs compared to other ByteDance products that have also surpassed 100 million daily active users [1][3] Group 1: User Engagement and Growth - Doubao's daily active user count has surpassed 100 million, indicating its successful penetration into the market [1] - The product's user engagement is expected to lead to a shift towards commercialization, as seen with other successful internet products [3] Group 2: Operational Costs and Model Efficiency - Doubao's large model has a daily token usage exceeding 50 trillion, which has increased by over 10 times year-on-year [3] - The cost of operating Doubao is estimated at approximately 2.5 million yuan per day, although optimizations may reduce this to around 2 million yuan [6][8] - The model's architecture allows for significant cost savings, activating only 10% of parameters during inference, which can theoretically save 90% of computational resources [6] Group 3: Commercialization Strategies - Doubao's next step is commercialization, with potential methods including subscription services or advertising, similar to other AI products [10][12] - The advertising model may involve subtle product placements within user interactions, making it both effective and unobtrusive [12]
2025年AI大模型资料汇编
Sou Hu Cai Jing· 2025-12-24 10:45
Group 1: Core Insights - The AI large model industry is undergoing a structural transformation in 2025, shifting competition from mere capability to sustainability across four dimensions: technological paradigms, market structure, application forms, and global governance [1] - Significant breakthroughs in technology include a shift from RLHF to RLVR training paradigms, enabling models to achieve leaps in reasoning capabilities through self-verification [1] - The mixed expert (MoE) architecture is making a strong comeback, balancing parameter scale and computational costs through sparse activation modes, thus achieving extreme cost-effectiveness [1] Group 2: Market Dynamics - The market is experiencing a dual tension of centralization and democratization, with Google’s Gemini 3 ending OpenAI's long-standing lead, while Chinese models achieve competitive advantages through cost-effectiveness [2] - The market is concentrating towards leading players, with top startups like Anthropic receiving significant funding, while second and third-tier players face elimination [2] - Open-source models, led by Chinese firms, are approaching the performance of closed-source products, creating a counterbalance in the market [2] Group 3: Application Evolution - Applications are evolving into a new stage of deep integration, transitioning from general chat assistants to specialized tools and autonomous agents embedded in professional workflows [2] - The rise of "AI-native application layers" is transforming software development, with developers shifting roles from coders to system designers and AI trainers [2] - Deployment models are trending towards "cloud + edge collaboration," with local deployments gaining traction due to privacy compliance needs [2] Group 4: Global Governance - Global governance is entering a phase of differentiated competition, with the EU prioritizing safety through strict regulations, the US focusing on industry self-regulation, and China advocating a balanced approach to development and safety [3] - The regulatory competition is driven by the struggle for technological standard-setting authority, emerging as a new battleground in tech competition [3] - The societal impact of AI is beginning to show through employment structure adjustments and educational model transformations, with human-AI collaboration becoming a new trend [3] Group 5: Future Outlook - The AI large model industry is transitioning from a scale competition to a new phase emphasizing efficiency, depth, and integration [3] - Future winners will need to navigate the complex interactions of four forces: technological efficiency, scenario integration, ecological positioning, and compliance adaptation [3] - Key opportunities include "cloud + edge collaboration," parallel tracks of open-source and closed-source development, and the evolution of the agent ecosystem [3]
告别 “专家垄断”!AdaMoE 破解 VLA 模型效率与精度两难问题
具身智能之心· 2025-10-21 00:03
Core Viewpoint - The article discusses the AdaMoE architecture, which enhances the performance of Vision-Language-Action (VLA) models in robotic control by decoupling expert selection and weight distribution, leading to improved success rates in both simulation and real-world tasks [1][24]. Summary by Sections Research Background: The Three Dilemmas of VLA Models - Traditional VLA models face three main dilemmas: 1. Difficulty in improving performance due to high training costs, as collecting precise robotic data is resource-intensive [2]. 2. The challenge of real-time control, where dense models require all parameters to be activated, slowing down response times [3]. 3. The inefficiency of using Mixture of Experts (MoE) due to conflicts among experts, which hinders effective task execution [5]. Core Design: The Decoupling Magic of AdaMoE - AdaMoE's innovation lies in its ability to separate the roles of expert selection and performance evaluation, allowing each component to focus on its strengths rather than trying to solve all problems simultaneously [6]. Key Designs of AdaMoE - **Design 1**: Utilizes pre-trained weights to significantly reduce training costs by focusing on fine-tuning specialized skills rather than relearning basic actions [8]. - **Design 2**: Implements "sparse activation" and dual-module decoupling to balance capacity and efficiency while preventing conflicts among experts [9][10]. Key Findings: Advantages of Decoupling - The research team conducted extensive experiments revealing four key conclusions that highlight the superiority of AdaMoE: 1. Experts can effectively specialize in their tasks without interference, leading to improved performance [13]. 2. Decoupling responsibilities enhances performance compared to traditional coupling methods [15]. 3. Fewer, more specialized experts yield better results than a larger number of overlapping experts [19]. 4. Real-world scenarios benefit more from decoupling than simulated environments, with significant improvements in task success rates [22]. Experimental Results: Validation of AdaMoE - AdaMoE demonstrated superior performance across various benchmarks, achieving an average success rate of 96.0%, outperforming traditional models and other architectures [23]. Conclusion: The Breakthrough Significance of AdaMoE - AdaMoE not only improves performance but also provides a pathway for VLA models to operate effectively without excessive resource demands, emphasizing the importance of clear task specialization for both robots and humans [24][26].
华为盘古大模型与腾AI计算平台,共同构建软硬一体的AI技术体系
Investment Rating - The report does not explicitly state an investment rating for the AI industry or Huawei's AI initiatives. Core Insights - Huawei is exploring a full-stack AI competitive strategy through the integration of software and hardware, transitioning from merely catching up with state-of-the-art (SOTA) models to customizing model architectures to better leverage its self-developed Ascend hardware [6][20]. - The evolution of the Pangu model series reflects a shift from dense models to sparse architectures, addressing systemic issues in large-scale distributed systems and enhancing efficiency [6][22]. - The introduction of the CloudMatrix infrastructure supports the optimization of AI inference, enabling high throughput and low latency through a unified bus network and various operator-level optimizations [6][20]. Summary by Sections 1. Evolution of Pangu Models - The Pangu model series began with PanGu-α, a 200 billion parameter autoregressive Chinese language model, which established a technical route based on Ascend hardware [6][8]. - PanGu-Σ, launched in 2023, marked an exploration into trillion-parameter models, introducing a sparse architecture to reduce computational costs [8][10]. - Pangu 3.0 introduced a "5+N+X" architecture, focusing on industry-specific applications and enabling rapid deployment of AI capabilities across various sectors [15][16]. 2. Maximizing Ascend Hardware Efficiency - Pangu Pro MoE and Pangu Ultra MoE are designed to maximize the efficiency of Ascend hardware, with Pangu Pro MoE addressing load imbalance through a grouped expert mixture architecture [25][26]. - Pangu Ultra MoE employs a system-level optimization strategy, utilizing simulation-driven design to enhance performance on Ascend hardware [46][47]. 3. CloudMatrix Infrastructure - CloudMatrix serves as the physical foundation for AI inference, addressing new challenges posed by large language models and enabling high-performance computing through a distributed memory pool [6][20]. - The infrastructure supports various software innovations, allowing for efficient communication and optimization of AI models [6][20]. 4. Full-Stack Collaboration Strategy - Huawei's strategy emphasizes open-source models to build an ecosystem around Ascend hardware, integrating architecture, systems, and operators for comprehensive collaboration [6][20].
专为智能体应用打造,智谱新一代旗舰模型GLM-4.5来了!
硬AI· 2025-07-29 15:50
Core Viewpoint - The article discusses the launch of the new flagship model GLM-4.5 by Zhipu AI, which is designed for intelligent agent applications and has been released on HuggingFace and ModelScope platforms [2][3]. Group 1: Model Architecture and Performance - GLM-4.5 utilizes a mixture of experts (MoE) architecture with a total parameter count of 355 billion and 32 billion active parameters, while GLM-4.5-Air has 106 billion total parameters and 12 billion active parameters [4][6]. - The model integrates reasoning, coding, and intelligent agent capabilities, achieving a comprehensive performance ranking in the global top three, and is the leading domestic and open-source model [3][4]. - In comparative tests against models like Claude Code and Kimi-K2, GLM-4.5 demonstrated superior task completion and tool reliability, although it slightly lagged behind Claude-4-Sonnet in some dimensions [8]. Group 2: Cost and Efficiency - The API call pricing for GLM-4.5 is set at 0.8 yuan per million tokens for input and 2 yuan per million tokens for output, making it a cost-effective option [10]. - The high-speed version of the model supports a generation rate of up to 100 tokens per second, catering to high concurrency deployment needs [12]. Group 3: Training Data and Fine-tuning - The training data for GLM-4.5 encompasses 15 trillion tokens of general corpus, supplemented by 8 trillion tokens specifically fine-tuned for coding, reasoning, and agent tasks, enhanced through reinforcement learning [7]. Group 4: Agent Capabilities and Demonstrations - Zhipu AI has released multiple real-world scenario demos to showcase the agent capabilities of GLM-4.5, including a simulated search engine, a video platform simulator, a playable Flappy Bird game, and an automated PPT tool [14].
MiniMax追着DeepSeek打
Jing Ji Guan Cha Wang· 2025-06-18 11:32
Core Viewpoint - MiniMax has launched its self-developed MiniMax M1 model, which competes directly with DeepSeek R1 and Google's Gemini 2.5 Pro in terms of key technical specifications, architecture design, context processing capabilities, and training costs [1][2]. Group 1: Model Specifications - MiniMax M1 supports a context length of 1 million tokens, which is 8 times larger than DeepSeek R1's 128,000 tokens and only slightly behind Google's Gemini 2.5 Pro [1]. - The total parameter count for MiniMax M1 is 456 billion, with 45.9 billion parameters activated per token, while DeepSeek R1 has a total of 671 billion parameters but activates only 37 billion per token [1]. Group 2: Cost Efficiency - MiniMax M1 consumes only 25% of the floating-point operations compared to DeepSeek R1 when generating 100,000 tokens, and requires less than half the computational power for inference tasks of 64,000 tokens [2]. - The training cost for MiniMax M1 was only $535,000, significantly lower than the initial expectations and much less than the $5-6 million GPU cost for training DeepSeek R1 [2]. Group 3: Pricing Strategy - MiniMax M1 has a tiered pricing model for its API services based on the number of input or output tokens, with the first tier charging 0.8 yuan per million input tokens and 8 yuan per million output tokens, which is lower than DeepSeek R1's pricing [3]. - The pricing for the first two tiers of MiniMax M1 is lower than that of DeepSeek R1, and the third tier for long text is currently not covered by DeepSeek [3]. Group 4: Technology Innovations - MiniMax M1's capabilities are supported by two core technologies: the linear attention mechanism (Lightning Attention) and the reinforcement learning algorithm CISPO, which enhances efficiency and stability in training [2].
200亿AI独角兽反击,MiniMax首款推理模型对标DeepSeeK,算力成本仅53万美元
Hua Er Jie Jian Wen· 2025-06-17 11:57
Core Insights - MiniMax, a Chinese AI startup valued at 20 billion RMB, has launched its first inference model, M1, which challenges leading models like DeepSeek and others with significantly lower training costs and superior efficiency [1][6]. Performance and Efficiency - M1 outperforms domestic closed-source models and approaches the performance of the best overseas models, surpassing DeepSeek, Alibaba, ByteDance, OpenAI, Google, and Anthropic in certain tasks [1]. - In terms of efficiency, M1 consumes less than 50% of the computational power of DeepSeek R1 when generating 64K tokens, and only 25% for 100K tokens [7]. - The model has a total of 456 billion parameters and supports context inputs of up to 1 million tokens, which is eight times that of DeepSeek R1 [3]. Cost Efficiency - The entire training process for M1 utilized 512 NVIDIA H800 GPUs over three weeks, with a rental cost of approximately 537,400 USD (around 3.8 million RMB), which is an order of magnitude lower than initially expected [6]. - MiniMax has developed a new reinforcement learning algorithm named CISPO, which achieved double the speed of ByteDance's recent DAPO algorithm, requiring only 50% of the training steps to reach similar performance [6]. Market Positioning - MiniMax has adopted a tiered pricing strategy for its API, making M1 more cost-effective compared to DeepSeek R1, especially in the input length ranges of 0-32K and 32K-128K tokens [8]. - M1 is positioned as a "price killer" in the market, receiving positive feedback from developers for its cost-performance ratio [8]. Future Developments - M1 is just the first product in a series of releases planned by MiniMax, which aims to introduce intelligent agent applications and further updates in video and music model capabilities [9]. - The company believes that M1's efficient architecture will provide unique advantages in future intelligent agent applications that require extensive reasoning and integration of long-context information [9].