Workflow
超连接(HC)
icon
Search documents
DeepSeek新年炸场!梁文锋署名论文发布
第一财经· 2026-01-01 14:49
Core Viewpoint - DeepSeek has introduced a new network architecture called mHC (Manifold-Constrained Hyper-Connections) aimed at addressing instability issues in large-scale model training, potentially guiding the evolution of next-generation infrastructure [3][6]. Group 1: Technical Innovations - The mHC architecture improves upon traditional hyper-connection frameworks by stabilizing information transmission in neural networks, akin to adding "traffic rules" to information channels, thus enhancing model training efficiency and scalability [7]. - The paper suggests that mHC opens up numerous promising research avenues, potentially reigniting academic interest in macro-architecture design and deepening understanding of how topological structures affect optimization and representation learning [8]. Group 2: Industry Implications - mHC may enable companies to reduce hardware investments and shorten training cycles when developing larger foundational models, lowering the barrier for small to medium AI enterprises to create more complex models [8]. - Enhanced training stability and scalability could facilitate the deployment of large models in more complex scenarios, such as multi-modal models requiring extensive parameters and industrial-grade intelligent decision systems [8]. - Industry experts view DeepSeek's research as foundational innovation, predicting significant updates in the upcoming V4 version based on this architecture [8]. Group 3: Recent Developments - Despite not launching major versions like R2 or V4 in 2025, DeepSeek has continued to iterate and open-source its models, releasing DeepSeek-V3.2 and DeepSeek-Math-V2, the latter being the first mathematical model to reach international Olympiad gold medal standards [9].
DeepSeek,最新发布!
券商中国· 2026-01-01 12:40
Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) to address the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Research and Development - The paper highlights that recent advancements in hyperconnections (HC) have broadened the residual flow width and diversified connection patterns, enhancing the widely adopted residual connection paradigm established over the past decade. However, these improvements have weakened the inherent identity mapping characteristics of residual connections, leading to severe training instability and limited scalability, along with significant memory access overhead [3]. - To tackle these challenges, DeepSeek proposed the mHC framework, which projects the HC residual connection space onto a specific manifold, thereby restoring the identity mapping characteristics and integrating strict infrastructure optimizations to ensure operational efficiency [3]. Experimental Results - Internal large-scale training results indicate that mHC effectively supports scalable training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4]. Conclusion and Future Directions - The conclusion of the paper states that empirical results demonstrate mHC's ability to effectively restore identity mapping characteristics, achieving stable large-scale training with superior scalability compared to traditional HC. Importantly, mHC implements these improvements with negligible computational overhead through efficient infrastructure-level optimizations [6]. - As a generalized extension of the HC paradigm, mHC opens up several important research directions for the future. While this study utilized a double random matrix to ensure stability, the framework is compatible with various manifold constraints designed for specific learning objectives. In-depth research on differentiated geometric constraints may lead to new methods that better balance plasticity and stability [6].
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]