Workflow
Sinkhorn - Knopp算法
icon
Search documents
梁文锋DeepSeek新论文!接棒何恺明和字节,又稳了稳AI的“地基”
Xin Lang Cai Jing· 2026-01-02 05:27
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyper-Connections), which significantly improves the residual connection component of the Transformer architecture, a foundational element that has seen little change since its inception in 2015 [1][3] Group 1: Historical Context - The evolution of neural network architectures began with ResNet, introduced by Kaiming He in 2015, which addressed the vanishing gradient problem and enabled the training of very deep networks [3] - The Transformer model, released in 2017, adopted residual connections as a standard feature, forming the basis for many leading models today [3] Group 2: Technical Comparisons - Hyper-Connections, proposed by ByteDance in 2024, expanded the single residual flow into multiple parallel streams, enhancing model performance but introducing stability issues during training [5][10] - mHC aims to resolve the stability problems associated with Hyper-Connections by constraining the connection weight matrix within a specific mathematical space, ensuring that signal amplification does not occur [10][12] Group 3: Mathematical Innovation - The core innovation of mHC involves using a Doubly Stochastic Matrix for the connection weights, which guarantees that the output does not exceed the maximum input value, thus preserving energy conservation [10][12] - The implementation of mHC utilizes the Sinkhorn-Knopp algorithm to achieve the desired matrix properties efficiently, allowing for end-to-end training without introducing new hyperparameters [11][12] Group 4: Engineering Excellence - DeepSeek's approach to implementing mHC demonstrates significant engineering capabilities, including the development of custom CUDA kernels and operator fusion techniques to minimize computational delays [16] - The ability to integrate innovative mathematical solutions into practical training environments highlights DeepSeek's competitive advantage in the AI research landscape [16]
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
Xin Lang Cai Jing· 2026-01-01 11:45
来源:量子位 | 公众号 QbitAI 残差连接十年未变,扩展之后却带来隐患 2026年新年第一天,DeepSeek上传新论文。 给何恺明2016成名作ResNet中提出的深度学习基础组件"残差连接"来了一场新时代的升级。 DeepSeek梁文峰亲自署名论文,共同一作为Zhenda Xie , Yixuan Wei, Huanqi Cao。 DeepSeek团队的实验表明,在这三个映射中,负责残差流内部信息交换的Hres矩阵贡献了最显著的性能 提升。 残差连接自2016年ResNet问世以来,一直是深度学习架构的基石。 其核心机制简洁明了,x+1 = x + F (x ,W),即下一层的输出等于当前层输入加上残差函数的输 出。 这个设计之所以成功,关键在于"恒等映射"属性,信号可以从浅层直接传递到深层,不经任何修改。 随着Transformer架构的崛起,这一范式已成为GPT、LLaMA等大语言模型的标准配置。 这个设计之所以成功,关键在于"恒等映射"属性,信号可以从浅层直接传递到深层,不经任何修改。 近期出现的Hyper-Connections(HC)试图打破这一格局。HC将残差流的宽度从C维扩展到n×C维 ...