混合专家模型
Search documents
英伟达推出新一代人工智能平台Vera Rubin
Xin Lang Cai Jing· 2026-01-06 15:48
人工智能巨头英伟达(NVDA)于周一在拉斯维加斯举办的 2026 年国际消费电子展(CES)上,正式 发布其新一代超级芯片 "Vera Rubin"。该芯片是英伟达全新推出的鲁宾平台 所包含的六款芯片之一,在 单一处理器中集成了一颗薇拉中央处理器(CPU)与两颗鲁宾图形处理器(GPU)。 英伟达称,鲁宾平台是智能体人工智能、高级推理模型以及混合专家模型(MoE) 的理想算力支撑。 混合专家模型整合了一系列 "专精型" 人工智能模型,可根据用户提出的问题,将查询指令分配至对应 的专业模型进行处理。 英伟达首席执行官黄仁勋在一份声明中表示:"鲁宾平台的推出恰逢其时,当前人工智能模型训练与推 理的算力需求正呈爆发式增长。" "我们始终以每年迭代一代人工智能超级计算机的节奏稳步推进,并通过对六款全新芯片的深度协同设 计,让鲁宾平台朝着人工智能的下一个前沿领域实现跨越式迈进。" 除薇拉 CPU 与鲁宾 GPU 外,鲁宾平台还包含另外四款面向网络与存储的芯片,分别是:英伟达 NVLink 6 交换机、英伟达 ConnectX-9 超级网络接口卡、英伟达 BlueField-4 数据处理器以及英伟达 Spectrum-6 ...
DeepSeek悄悄开源LPLB:用线性规划解决MoE负载不均
机器之心· 2025-11-20 15:13
Core Insights - DeepSeek has launched a new code repository called LPLB (Linear-Programming-Based Load Balancer) on GitHub, which aims to optimize the workload distribution in Mixture of Experts (MoE) models [2][5]. - The project is currently in the early research stage, and its performance improvements are still under evaluation [8][15]. Project Overview - LPLB is designed to address dynamic load imbalance issues during MoE training by utilizing linear programming algorithms [5][9]. - The load balancing process involves three main steps: dynamic reordering of experts based on workload statistics, constructing replicas of experts, and solving for optimal token distribution for each batch of data [5][6]. Technical Mechanism - The expert reordering process is assisted by EPLB (Expert Parallel Load Balancer), and real-time workload statistics can be collected from various sources [6][11]. - LPLB employs a lightweight solver that uses NVIDIA's cuSolverDx and cuBLASDx libraries for efficient linear algebra operations, ensuring minimal resource consumption during the optimization process [6][11]. Limitations - LPLB currently focuses on dynamic fluctuations in workload, while EPLB addresses static imbalances [11][12]. - The system has some limitations, including ignoring nonlinear computation costs and potential delays in solving optimization problems, which may affect performance under certain conditions [11][12]. Application and Value - The LPLB library aims to solve the "bottleneck effect" in large model training, where the training speed is often limited by the slowest GPU [15]. - It introduces linear programming as a mathematical tool for real-time optimal allocation and leverages NVSHMEM technology to overcome communication bottlenecks, making it a valuable reference for developers researching MoE architecture training acceleration [15].
OpenAI掌门人曝GPT-6瓶颈,回答黄仁勋提问,几乎为算力“抵押未来”
3 6 Ke· 2025-08-16 04:04
Group 1 - The core observation made by Greg Brockman is that as computational power and data scale rapidly expand, foundational research is making a comeback, and the importance of algorithms is once again highlighted as a key bottleneck for future AI development [1][21][22] - Brockman emphasizes that both engineering and research are equally important in driving AI advancements, and that OpenAI has always maintained a philosophy of treating both disciplines with equal respect [3][6][8] - OpenAI has faced challenges in resource allocation between product development and research, sometimes having to "mortgage the future" by reallocating computational resources originally intended for research to support product launches [8][9][10] Group 2 - The concept of "vibe coding" is discussed, indicating a shift towards serious software engineering practices, where AI is expected to assist in transforming existing applications rather than just creating flashy projects [11][12] - Brockman highlights the need for a robust AI infrastructure that can handle diverse workloads, including both long-term computational tasks and real-time processing demands, which is a complex design challenge [16][18][19] - The future economic landscape is anticipated to be driven by AI, with a diverse model library emerging that will create numerous opportunities for engineers to build systems that enhance productivity and efficiency [24][25][27]
Kimi K2 不仅抢了开源第一,还抢了自家论文署名:我「夸」我自己
3 6 Ke· 2025-07-22 11:07
Core Insights - The article discusses the release of Kimi K2, the world's first open-source model with over one trillion parameters, which has sparked significant discussion in the industry [1][2] - Kimi K2 has achieved the top rank on the LMSYS open-source model leaderboard, surpassing other models in various evaluation benchmarks [2][27] - The Kimi team acknowledges that Kimi K2 is an improvement upon DeepSeek V3, addressing concerns of potential plagiarism [3][5] Technical Aspects - Kimi K2 features a mixed expert model with 1.04 trillion total parameters and 32 billion active parameters, utilizing a sparse architecture with a sparsity of 48 [12] - The training data consists of 15.5 trillion tokens, covering various domains, and has undergone quality cleaning and data augmentation techniques [12] - The MuonClip optimizer is introduced to ensure stable training, preventing significant fluctuations in the loss function during the training process [13][16] Data Strategy - Kimi K2 employs a dual data strategy combining synthetic and real-world data, generating over 100,000 tool trajectories for community use [11][20] - The model utilizes a library of over 3,000 real tools and 20,000 synthetic tools across various fields, ensuring a comprehensive training environment [20][23] - The data rewriting strategies enhance the diversity of training data, improving model performance and reducing overfitting [17][19] Performance Metrics - Kimi K2 has achieved or approached optimal performance in key areas such as coding, mathematics, tool usage, and long-text tasks, outperforming several proprietary models [27][29] - The model's performance is validated through various benchmarks, demonstrating its capabilities in complex reasoning and task execution [29][30] - Future iterations of Kimi K2 will focus on improving reasoning efficiency and self-evaluation in tool usage [31]
华为盘古大模型首次打榜:昇腾原生 72B MoE 模型登顶 SuperCLUE 千亿内模型榜首
第一财经· 2025-05-28 13:36
Core Viewpoint - The article highlights the emergence of the Mixture of Grouped Experts (MoGE) model developed by Huawei's Pangu team as a significant innovation in the AI field, particularly in large language models (LLMs), addressing the challenges of traditional Mixture of Experts (MoE) architectures and achieving efficient training and performance [1][10][31]. Group 1: MoGE Architecture and Innovations - The MoGE model introduces a dynamic grouping mechanism during the expert selection phase, optimizing load distribution and enabling balanced resource allocation across devices, thus overcoming the engineering bottlenecks of traditional MoE architectures [1][10]. - The Pangu Pro MoE model, based on the MoGE architecture, has a total parameter count of 72 billion, with 16 billion active parameters, achieving industry-leading inference efficiency on Ascend 300I Duo and 800I A2 chips, reaching 321 tokens/s and 1528 tokens/s respectively [2][22]. - Compared to other models like DeepSeek-R1, which has 671 billion parameters, the Pangu Pro MoE achieves comparable performance with only 1/10th the parameter count, setting a new benchmark for computational efficiency and model effectiveness [3][29]. Group 2: Performance and Benchmarking - The Pangu Pro MoE scored 59 points on the SuperCLUE benchmark, ranking it first among domestic models with less than 100 billion parameters, demonstrating its capability to rival larger models in performance [2][25]. - The model exhibits superior performance in various complex reasoning tasks, outperforming other leading models in benchmarks such as MMLU and DROP, showcasing its versatility across different domains [26][27]. Group 3: Industry Implications and Future Directions - The introduction of the MoGE architecture signifies a shift from a parameter-centric approach to a focus on practical efficiency, enabling smaller enterprises to leverage large models without prohibitive costs, thus democratizing access to advanced AI technologies [31][32]. - Huawei's integrated approach, combining architecture, chips, and engines, facilitates the deployment of large models in real-world applications, breaking the misconception that large models require exorbitant deployment costs [31][32].
重磅发布 | 复旦《大规模语言模型:从理论到实践(第2版)》全新升级,聚焦AI前沿
机器之心· 2025-04-28 01:26
机器之心发布 机器之心编辑部 《大规模语言模型:从理论到实践(第 2版)》 是一本理论与实践并重的专业 技术书 ,更是 AI时代不可或缺的知识工具书。 任何人 都能在本 书中找到属于自己的成长路径。 在人工智能浪潮席卷全球的今天,大语言模型正以前所未有的速度推动着科技进步和产业变革。从 ChatGPT 到各类行业应用,LLM 不仅重塑 了人机交互的方式,更成为推动学术研究与产业创新的关键技术。 面对这一飞速演进的技术体系,如何系统理解其理论基础、掌握核心算法与工程实践,已成为每一位 AI 从业者、研究者、高校学子的必修课。 2023 年 9 月,复旦大学张奇、桂韬、郑锐、黄萱菁研究团队面向全球学术界与产业界正式发布了《大规模语言模型:从理论到实践》。短短 两年,大语言模型在理论研究、预训练方法、后训练技术及解释性等方面取得了重要进展。业界对大语言模型的研究更加深入,逐渐揭示出许多 与传统深度学习和自然语言处理范式不同的特点。例如, 大语言模型仅需 60 条数据就能学习并展现出强大的问题回答能力,显示了其惊人的 泛化性 。然而,本书作者们也发现大语言模型存在一定的脆弱性。例如, 在一个拥有 130 亿个参数的模 ...
后DeepSeek时代,中国AI初创企业商业模式大调整
硬AI· 2025-03-25 12:41
Core Viewpoint - The rise of DeepSeek is reshaping the AI industry in China, prompting startups to adjust their strategies towards application-focused development rather than foundational model training [1][2]. Group 1: Strategic Adjustments of Chinese AI Startups - Startups like Kimi, Zero One Universe, Baichuan Intelligence, and Zhipu AI are shifting resources towards application development and reducing spending [1][3]. - Zero One Universe, founded by former Google China head Kai-Fu Lee, has ceased pre-training of its models and is now focusing on selling customized AI solutions based on DeepSeek [4]. - Kimi is cutting marketing expenses to enhance model training and replicate DeepSeek's success, while also exploring monetization through user engagement [5]. - Baichuan Intelligence is concentrating on healthcare applications, specifically developing AI tools to assist in diagnostics for hospitals [5]. Group 2: Company Performance and Financials - Zhipu AI is attempting to establish its enterprise sales business, reporting a revenue of 300 million RMB (approximately 41 million USD) in 2024, with a loss of 2 billion RMB [6]. - Zhipu AI has around 800 employees, making it the largest LLM startup in terms of workforce, compared to DeepSeek's approximately 160 employees [6]. - There are indications that Zhipu AI aims for an IPO by the end of the year, but the development of DeepSeek may impact this goal [6].