DeepSeek论文(mHC
Search documents
DeepSeek连发两篇论文背后,原来藏着一场学术接力
机器之心· 2026-01-16 00:42
Core Insights - The article discusses the evolution of deep learning architectures, particularly focusing on the advancements made by DeepSeek and ByteSeed in the context of residual connections and knowledge retrieval mechanisms [1][4]. Group 1: Deep Learning Architecture Evolution - The introduction of residual connections by He et al. in 2015 addressed the issue of information loss in deep neural networks, becoming a foundational element in deep learning [6][15]. - ByteSeed's introduction of Hyper-Connections (HC) in 2024 significantly enhanced network topology complexity without increasing computational costs, marking a shift from traditional residual connections [8][9]. - DeepSeek's mHC builds upon HC by addressing its scalability issues, improving stability and memory access efficiency for large-scale training [11][12]. Group 2: Knowledge Retrieval Mechanisms - DeepSeek's "Conditional Memory" paper proposes a method for efficient knowledge retrieval using an "Engram" system, allowing models to reference a large phrase dictionary for common queries, thus saving computational resources [18][21]. - The research highlights the importance of parameter allocation between MoE (Mixture of Experts) and static storage modules, revealing that allocating 20%-25% of parameters to Engram yields better performance [22]. - DeepSeek's approach integrates the Engram module into intermediate layers of the model, enhancing the efficiency of storage access and deep computation [22][23]. Group 3: Collaborative Research Impact - The collaboration and knowledge sharing between DeepSeek and ByteSeed exemplify the value of open research in advancing AI technologies, as both teams build upon each other's findings [28][29]. - The article emphasizes the importance of continuous exploration and innovation in foundational technologies, which may not yield immediate commercial applications but contribute to long-term industry progress [31].