混合专家模型 - filings, earnings calls, financial reports, news

混合专家模型

Search documents

机器之心· 2025-11-20 15:13

机器之心报道编辑：Panda 没有发推文，也没有公众号更新，少有的几个技术博主分享的推文也关注不多。截至目前，该项目的 star 数量也还没超过 200。但仔细一看，这个项目却似乎并不简单，值得更多关注。X 网友 gm8xx8 评论认为这表明 DeepSeek 正在解决正确性和吞吐量瓶颈问题，为下一版模型发布做准备。昨天，DeepSeek 在 GitHub 上线了一个新的代码库： LPLB 。项目地址：https://github.com/deepseek-ai/LPLB 项目简介 LPLB，全称 Linear-Programm i ng-Based Load Balancer ，即基于线性规划的负载均衡器。顾名思义，LPLB 是一个并行负载均衡器，它利用线性规划（Linear Programming）算法来优化 MoE（混合专家）模型中的专家并行工作负载分配。具体来说，LPLB 通过以下三个步骤实现动态负载均衡： 3. 求解最优分配：针对每个批次（Batch）的数据，求解最优的 Token 分配方案。 1. 动态重排序：基于工作负载统计信息对专家进行重排序（Reordering）。 2 ...

线性规划

混合专家模型

动态负载均衡

Artificial Intelligence

Artificial Intelligence

LPLB

OpenAI掌门人曝GPT-6瓶颈，回答黄仁勋提问，几乎为算力“抵押未来”

3 6 Ke· 2025-08-16 04:04

Group 1 - The core observation made by Greg Brockman is that as computational power and data scale rapidly expand, foundational research is making a comeback, and the importance of algorithms is once again highlighted as a key bottleneck for future AI development [1][21][22] - Brockman emphasizes that both engineering and research are equally important in driving AI advancements, and that OpenAI has always maintained a philosophy of treating both disciplines with equal respect [3][6][8] - OpenAI has faced challenges in resource allocation between product development and research, sometimes having to "mortgage the future" by reallocating computational resources originally intended for research to support product launches [8][9][10] Group 2 - The concept of "vibe coding" is discussed, indicating a shift towards serious software engineering practices, where AI is expected to assist in transforming existing applications rather than just creating flashy projects [11][12] - Brockman highlights the need for a robust AI infrastructure that can handle diverse workloads, including both long-term computational tasks and real-time processing demands, which is a complex design challenge [16][18][19] - The future economic landscape is anticipated to be driven by AI, with a diverse model library emerging that will create numerous opportunities for engineers to build systems that enhance productivity and efficiency [24][25][27]

Kimi K2 不仅抢了开源第一，还抢了自家论文署名：我「夸」我自己

3 6 Ke· 2025-07-22 11:07

Core Insights - The article discusses the release of Kimi K2, the world's first open-source model with over one trillion parameters, which has sparked significant discussion in the industry [1][2] - Kimi K2 has achieved the top rank on the LMSYS open-source model leaderboard, surpassing other models in various evaluation benchmarks [2][27] - The Kimi team acknowledges that Kimi K2 is an improvement upon DeepSeek V3, addressing concerns of potential plagiarism [3][5] Technical Aspects - Kimi K2 features a mixed expert model with 1.04 trillion total parameters and 32 billion active parameters, utilizing a sparse architecture with a sparsity of 48 [12] - The training data consists of 15.5 trillion tokens, covering various domains, and has undergone quality cleaning and data augmentation techniques [12] - The MuonClip optimizer is introduced to ensure stable training, preventing significant fluctuations in the loss function during the training process [13][16] Data Strategy - Kimi K2 employs a dual data strategy combining synthetic and real-world data, generating over 100,000 tool trajectories for community use [11][20] - The model utilizes a library of over 3,000 real tools and 20,000 synthetic tools across various fields, ensuring a comprehensive training environment [20][23] - The data rewriting strategies enhance the diversity of training data, improving model performance and reducing overfitting [17][19] Performance Metrics - Kimi K2 has achieved or approached optimal performance in key areas such as coding, mathematics, tool usage, and long-text tasks, outperforming several proprietary models [27][29] - The model's performance is validated through various benchmarks, demonstrating its capabilities in complex reasoning and task execution [29][30] - Future iterations of Kimi K2 will focus on improving reasoning efficiency and self-evaluation in tool usage [31]

Agentic Intelligence（智能体智能）

Agentic Intelligence（智能体智能）

华为盘古大模型首次打榜：昇腾原生 72B MoE 模型登顶 SuperCLUE 千亿内模型榜首

第一财经· 2025-05-28 13:36

Core Viewpoint - The article highlights the emergence of the Mixture of Grouped Experts (MoGE) model developed by Huawei's Pangu team as a significant innovation in the AI field, particularly in large language models (LLMs), addressing the challenges of traditional Mixture of Experts (MoE) architectures and achieving efficient training and performance [1][10][31]. Group 1: MoGE Architecture and Innovations - The MoGE model introduces a dynamic grouping mechanism during the expert selection phase, optimizing load distribution and enabling balanced resource allocation across devices, thus overcoming the engineering bottlenecks of traditional MoE architectures [1][10]. - The Pangu Pro MoE model, based on the MoGE architecture, has a total parameter count of 72 billion, with 16 billion active parameters, achieving industry-leading inference efficiency on Ascend 300I Duo and 800I A2 chips, reaching 321 tokens/s and 1528 tokens/s respectively [2][22]. - Compared to other models like DeepSeek-R1, which has 671 billion parameters, the Pangu Pro MoE achieves comparable performance with only 1/10th the parameter count, setting a new benchmark for computational efficiency and model effectiveness [3][29]. Group 2: Performance and Benchmarking - The Pangu Pro MoE scored 59 points on the SuperCLUE benchmark, ranking it first among domestic models with less than 100 billion parameters, demonstrating its capability to rival larger models in performance [2][25]. - The model exhibits superior performance in various complex reasoning tasks, outperforming other leading models in benchmarks such as MMLU and DROP, showcasing its versatility across different domains [26][27]. Group 3: Industry Implications and Future Directions - The introduction of the MoGE architecture signifies a shift from a parameter-centric approach to a focus on practical efficiency, enabling smaller enterprises to leverage large models without prohibitive costs, thus democratizing access to advanced AI technologies [31][32]. - Huawei's integrated approach, combining architecture, chips, and engines, facilitates the deployment of large models in real-world applications, breaking the misconception that large models require exorbitant deployment costs [31][32].

混合专家模型

分组混合专家模型

Artificial Intelligence

盘古Pro MoE大模型

混合专家模型

分组混合专家模型

Artificial Intelligence

盘古Pro MoE大模型

重磅发布 | 复旦《大规模语言模型：从理论到实践（第2版）》全新升级，聚焦AI前沿

机器之心· 2025-04-28 01:26

机器之心发布机器之心编辑部《大规模语言模型：从理论到实践（第 2版）》是一本理论与实践并重的专业技术书，更是 AI时代不可或缺的知识工具书。任何人都能在本书中找到属于自己的成长路径。在人工智能浪潮席卷全球的今天，大语言模型正以前所未有的速度推动着科技进步和产业变革。从 ChatGPT 到各类行业应用，LLM 不仅重塑了人机交互的方式，更成为推动学术研究与产业创新的关键技术。面对这一飞速演进的技术体系，如何系统理解其理论基础、掌握核心算法与工程实践，已成为每一位 AI 从业者、研究者、高校学子的必修课。 2023 年 9 月，复旦大学张奇、桂韬、郑锐、黄萱菁研究团队面向全球学术界与产业界正式发布了《大规模语言模型：从理论到实践》。短短两年，大语言模型在理论研究、预训练方法、后训练技术及解释性等方面取得了重要进展。业界对大语言模型的研究更加深入，逐渐揭示出许多与传统深度学习和自然语言处理范式不同的特点。例如，大语言模型仅需 60 条数据就能学习并展现出强大的问题回答能力，显示了其惊人的泛化性。然而，本书作者们也发现大语言模型存在一定的脆弱性。例如，在一个拥有 130 亿个参数的模 ...

后DeepSeek时代，中国AI初创企业商业模式大调整

硬AI· 2025-03-25 12:41

Core Viewpoint - The rise of DeepSeek is reshaping the AI industry in China, prompting startups to adjust their strategies towards application-focused development rather than foundational model training [1][2]. Group 1: Strategic Adjustments of Chinese AI Startups - Startups like Kimi, Zero One Universe, Baichuan Intelligence, and Zhipu AI are shifting resources towards application development and reducing spending [1][3]. - Zero One Universe, founded by former Google China head Kai-Fu Lee, has ceased pre-training of its models and is now focusing on selling customized AI solutions based on DeepSeek [4]. - Kimi is cutting marketing expenses to enhance model training and replicate DeepSeek's success, while also exploring monetization through user engagement [5]. - Baichuan Intelligence is concentrating on healthcare applications, specifically developing AI tools to assist in diagnostics for hospitals [5]. Group 2: Company Performance and Financials - Zhipu AI is attempting to establish its enterprise sales business, reporting a revenue of 300 million RMB (approximately 41 million USD) in 2024, with a loss of 2 billion RMB [6]. - Zhipu AI has around 800 employees, making it the largest LLM startup in terms of workforce, compared to DeepSeek's approximately 160 employees [6]. - There are indications that Zhipu AI aims for an IPO by the end of the year, but the development of DeepSeek may impact this goal [6].

人工智能

大语言模型

混合专家模型

Artificial Intelligence

Artificial Intelligence

DeepSeek模型

Yi开源模型