Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单