Workflow
DeepSeek V2
icon
Search documents
万字解析DeepSeek MOE架构!
自动驾驶之心· 2025-08-14 23:33
Core Viewpoint - The article provides a comprehensive overview of the Mixture of Experts (MoE) architecture, particularly focusing on the evolution and implementation of DeepSeek's MoE models (V1, V2, V3) and their optimizations in handling token distribution and load balancing in AI models [2][21][36]. Group 1: MoE Architecture Overview - MoE, or Mixture of Experts, is a model architecture that utilizes multiple expert networks to enhance performance, particularly in sparse settings suitable for cloud computing [2][3]. - The initial interest in MoE architecture surged with the release of Mistral.AI's Mixtral model, which highlighted the potential of sparse architectures in AI [2][3]. - The Switch Transformer model introduced a routing mechanism that allows tokens to select the top-K experts, optimizing the processing of diverse knowledge [6][10]. Group 2: DeepSeek V1 Innovations - DeepSeek V1 addresses two main issues in existing MoE practices: knowledge mixing and redundancy, which hinder expert specialization [22][24]. - The model introduces fine-grained expert division and shared experts to enhance specialization and reduce redundancy, allowing for more efficient knowledge capture [25][26]. - The architecture includes a load balancing mechanism to ensure even distribution of tokens across experts, mitigating training inefficiencies [32]. Group 3: DeepSeek V2 Enhancements - DeepSeek V2 builds on V1's design, implementing three key optimizations focused on load balancing [36]. - The model limits the number of devices used for routing experts to reduce communication overhead, enhancing efficiency during training and inference [37]. - A new communication load balancing loss function is introduced to ensure equitable token distribution across devices, further optimizing performance [38]. Group 4: DeepSeek V3 Developments - DeepSeek V3 introduces changes in the MOE layer computation, replacing the softmax function with a sigmoid function to improve computational efficiency [44]. - The model eliminates auxiliary load balancing losses, instead using a learnable bias term to control routing, which enhances load balancing during training [46]. - A sequence-level auxiliary loss is added to prevent extreme imbalances within individual sequences, ensuring a more stable training process [49].
DeepSeek爆火100天:梁文锋「藏锋」
36氪· 2025-05-16 09:21
Core Viewpoint - The article discusses the significant impact of DeepSeek and its founder Liang Wenfeng on the AI industry, particularly following the release of the DeepSeek R1 model, which has shifted the focus from GPT models to Reasoner models, marking a new era in AI development [3][4]. Group 1: DeepSeek's Impact on the AI Industry - DeepSeek's R1 model release has led to a paradigm shift in AI research, with many companies now focusing on reasoning models instead of traditional GPT models [3][4]. - The low-cost training strategy advocated by Liang Wenfeng has positioned DeepSeek as a major player in the AI landscape, raising concerns about the sustainability of high-end computing resources represented by Nvidia [4][5]. - Following the R1 model launch, Nvidia's market value dropped by nearly $600 billion, highlighting the market's reaction to DeepSeek's advancements [5][6]. Group 2: Industry Reactions and Developments - Nvidia's CEO Jensen Huang has publicly addressed concerns regarding DeepSeek's impact on computing power requirements, emphasizing that DeepSeek has not reduced the demand for computational resources [6][7]. - The demand for H20 chips, which are crucial for AI applications, has surged in China due to DeepSeek's influence, despite new export restrictions imposed by the U.S. [7][8]. - Liang Wenfeng's approach has sparked a broader industry shift, with major tech companies in China adjusting their strategies to compete with DeepSeek's cost-effective models [9][40]. Group 3: Future Prospects and Innovations - The anticipation for the upcoming R2 model from DeepSeek is high, as the industry expects further innovations from Liang Wenfeng [11][43]. - DeepSeek has maintained a focus on open-source development and has not pursued external financing, distinguishing itself from other AI startups [30][32]. - Liang Wenfeng's commitment to innovation is evident in the recent updates to DeepSeek's models, which have significantly improved performance in various tasks [35][36].
快看!这就是DeepSeek背后的公司
梧桐树下V· 2025-01-29 03:16
| © 企查查 企业主页 | | --- | | 杭州深度求索人工智能基础技术研 存续 | | 究有限公司 | | 21万+ 91330105MACPN4X08Y ¥ 发票抬头 | | 简介:DeepSeek成立于2023年,是一家通用人工智能模... 展开 | | 法定代表人 注册资本 成立日期 | | 製作 1000万元 2023-07-17 | | 企查查行业 规模 品丁 2023年 | | 信息系统集成服务 微型 XS 4人 | | & 0571-85377238 | | 9 浙江省杭州市拱墅区环城北路169号汇金国际大厦西1幢120 | | 1室 | | 宁波程图个业管理 | | 梁文章 服 咨询合伙 ... 大股东 | | 东 | | 持股比例 99.00% 持股比例 1.00% 2 | | 投资企业2家 关联企业15家 2 | | 裴活 王南军 | | 퀘 + 등 执行董事兼. 监事 | | 2 关联企业3家 关联企业2家 | 文/梧桐晓驴 DeepSeek爆火,晓驴好奇地去查了一下开发、运营DeepSeek的公司情况。 "企查查"显示:杭州深度求索人工智能基础技术研究有限公司,英文名Hangz ...