DeepSeek,最新发布!
券商中国·2026-01-01 12:40

Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) to address the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Research and Development - The paper highlights that recent advancements in hyperconnections (HC) have broadened the residual flow width and diversified connection patterns, enhancing the widely adopted residual connection paradigm established over the past decade. However, these improvements have weakened the inherent identity mapping characteristics of residual connections, leading to severe training instability and limited scalability, along with significant memory access overhead [3]. - To tackle these challenges, DeepSeek proposed the mHC framework, which projects the HC residual connection space onto a specific manifold, thereby restoring the identity mapping characteristics and integrating strict infrastructure optimizations to ensure operational efficiency [3]. Experimental Results - Internal large-scale training results indicate that mHC effectively supports scalable training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4]. Conclusion and Future Directions - The conclusion of the paper states that empirical results demonstrate mHC's ability to effectively restore identity mapping characteristics, achieving stable large-scale training with superior scalability compared to traditional HC. Importantly, mHC implements these improvements with negligible computational overhead through efficient infrastructure-level optimizations [6]. - As a generalized extension of the HC paradigm, mHC opens up several important research directions for the future. While this study utilized a double random matrix to ensure stability, the framework is compatible with various manifold constraints designed for specific learning objectives. In-depth research on differentiated geometric constraints may lead to new methods that better balance plasticity and stability [6].

DeepSeek,最新发布! - Reportify