残差连接范式
Search documents
DeepSeek又放大招!梁文锋署名新论文引关注
2 1 Shi Ji Jing Ji Bao Dao· 2026-01-02 11:12
Core Insights - DeepSeek has introduced a new framework called "Manifold-Constrained Hyperconnection" (mHC) aimed at enhancing scalability while reducing the computational power and energy requirements for training advanced AI systems [1][14][19] - The next flagship system, R2, is expected to be launched around the Chinese New Year in February [1][14] Summary of Key Points Introduction of mHC Framework - DeepSeek published a paper detailing the mHC framework, which addresses instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][15][16] - The paper lists three primary authors, including DeepSeek's founder Liang Wenfeng [1][17] Performance and Scalability - The mHC framework projects the residual connection space of hyperconnections onto a specific manifold, restoring the identity mapping property and integrating strict infrastructure optimizations for operational efficiency [3][19] - Empirical experiments indicate that mHC effectively supports large-scale training, providing notable performance improvements with better scalability. When the expansion rate is set to 4, it incurs only a 6.7% additional time overhead [3][19][21] Future Research Directions - The paper suggests that mHC serves as a flexible and practical extension of hyperconnection paradigms, potentially deepening the understanding of topological architecture design and guiding the evolution of foundational models [3][21] - It opens up several important research directions, including compatibility with various manifold constraints tailored to specific learning objectives and the exploration of differentiated geometric constraints to better balance plasticity and stability [3][21]
DeepSeek,最新发布!
券商中国· 2026-01-01 12:40
Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) to address the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Research and Development - The paper highlights that recent advancements in hyperconnections (HC) have broadened the residual flow width and diversified connection patterns, enhancing the widely adopted residual connection paradigm established over the past decade. However, these improvements have weakened the inherent identity mapping characteristics of residual connections, leading to severe training instability and limited scalability, along with significant memory access overhead [3]. - To tackle these challenges, DeepSeek proposed the mHC framework, which projects the HC residual connection space onto a specific manifold, thereby restoring the identity mapping characteristics and integrating strict infrastructure optimizations to ensure operational efficiency [3]. Experimental Results - Internal large-scale training results indicate that mHC effectively supports scalable training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4]. Conclusion and Future Directions - The conclusion of the paper states that empirical results demonstrate mHC's ability to effectively restore identity mapping characteristics, achieving stable large-scale training with superior scalability compared to traditional HC. Importantly, mHC implements these improvements with negligible computational overhead through efficient infrastructure-level optimizations [6]. - As a generalized extension of the HC paradigm, mHC opens up several important research directions for the future. While this study utilized a double random matrix to ensure stability, the framework is compatible with various manifold constraints designed for specific learning objectives. In-depth research on differentiated geometric constraints may lead to new methods that better balance plasticity and stability [6].
DeepSeek,最新发布!
Zheng Quan Shi Bao· 2026-01-01 10:56
Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
DeepSeek,最新发布!
证券时报· 2026-01-01 10:53
Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Introduction of mHC - DeepSeek's new paper presents mHC, which projects the hyperconnection's residual connection space onto a specific manifold to restore the identity mapping property and ensure operational efficiency through rigorous infrastructure optimization [3][4]. Performance and Scalability - Empirical results indicate that mHC effectively supports large-scale training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4][6]. Research Directions - mHC opens up several important research directions, including compatibility with various manifold constraints tailored for specific learning objectives and potential new methods for balancing plasticity and stability through in-depth studies of differential geometric constraints [7]. Recent Developments - DeepSeek has been active, releasing two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance comparable to GPT-5 in benchmark tests [8]. - The DeepSeek-V3.2-Speciale model combines enhanced reasoning capabilities with mathematical proof abilities, performing well in mainstream reasoning benchmarks [8]. - Additionally, the release of DeepSeek-V3.2-Exp introduces a sparse attention mechanism aimed at improving training and inference efficiency for long texts, with a significant reduction in API costs for developers [9]. Recognition in the Scientific Community - DeepSeek's research paper on the DeepSeek-R1 reasoning model was featured on the cover of the prestigious journal Nature, marking a significant milestone for Chinese AI technology in the international scientific community [9][10].