Workflow
流形约束超连接(mHC)架构
icon
Search documents
刚刚,DeepSeek 扔出大杀器,梁文锋署名!暴力优化 AI 架构
程序员的那些事· 2026-01-01 13:15
Core Insights - DeepSeek introduced a new architecture called "Manifold-Constrained Hyper-Connections" (mHC), which enhances performance with only a 6.7% increase in training time on a 27 billion parameter model [3][36]. - The mHC architecture optimizes the residual connection space by projecting matrices onto constrained manifolds, ensuring stability and significantly expanding the residual stream width without substantial computational costs [8][25]. Group 1: Performance Improvements - In system-level benchmark tests, the mHC architecture consistently outperformed baseline models and Hyper-Connections (HC) across various tasks, demonstrating its effectiveness in large-scale pre-training [22][51]. - Specific performance metrics showed that mHC achieved a 2.1% improvement on the BBH benchmark and a 2.3% improvement on the DROP benchmark compared to HC [52][54]. Group 2: Technical Details - The core idea of mHC is to restore identity mapping properties under the topology of Hyper-Connections, allowing for practical value in large-scale training and real-world foundational model tasks [25]. - mHC employs a double stochastic matrix constraint to maintain stability while enhancing the interaction between residual streams, which is crucial for maximizing the potential of multi-stream architectures [26][27]. Group 3: Engineering Optimizations - The implementation of mHC involved several engineering optimizations, including reordering operations to improve efficiency and using mixed precision strategies to maximize numerical accuracy without sacrificing computational speed [38][42]. - The DualPipe scheduling strategy was enhanced to effectively overlap communication and computation, addressing significant communication delays introduced by the n-stream residual structure [46][48].