DeepSeek,最新发布!
证券时报·2026-01-01 10:53

Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Introduction of mHC - DeepSeek's new paper presents mHC, which projects the hyperconnection's residual connection space onto a specific manifold to restore the identity mapping property and ensure operational efficiency through rigorous infrastructure optimization [3][4]. Performance and Scalability - Empirical results indicate that mHC effectively supports large-scale training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4][6]. Research Directions - mHC opens up several important research directions, including compatibility with various manifold constraints tailored for specific learning objectives and potential new methods for balancing plasticity and stability through in-depth studies of differential geometric constraints [7]. Recent Developments - DeepSeek has been active, releasing two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance comparable to GPT-5 in benchmark tests [8]. - The DeepSeek-V3.2-Speciale model combines enhanced reasoning capabilities with mathematical proof abilities, performing well in mainstream reasoning benchmarks [8]. - Additionally, the release of DeepSeek-V3.2-Exp introduces a sparse attention mechanism aimed at improving training and inference efficiency for long texts, with a significant reduction in API costs for developers [9]. Recognition in the Scientific Community - DeepSeek's research paper on the DeepSeek-R1 reasoning model was featured on the cover of the prestigious journal Nature, marking a significant milestone for Chinese AI technology in the international scientific community [9][10].