超连接(HC)
Search documents
DeepSeek,最新发布!
券商中国· 2026-01-01 12:40
DeepSeek发布新论文,梁文锋参与署名。 1月1日消息,DeepSeek发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。该研究旨在 解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益。这篇论文的第一作者有三 位:Zhenda Xie(解振达)、Yixuan Wei(韦毅轩)、Huanqi Cao。值得注意的是,DeepSeek创始人梁文锋也 在作者名单中。 DeepSeek表示,DeepSeek-V3.2的目标是平衡推理能力与输出长度,适合日常使用,例如问答场景和通用Agent 任务场景。在公开的推理类Benchmark测试中,DeepSeek-V3.2达到了GPT-5的水平,仅略低于Gemini-3.0-Pro; 相比Kimi-K2-Thinking,V3.2的输出长度大幅降低,显著减少了计算开销与用户等待时间。 论文摘要指出,近来,以超连接(HC)为代表的研究通过拓宽残差流宽度和多样化连接模式,拓展了过去十 年间确立的普遍采用的残差连接范式。虽然这些改进带来了显著的性能提升,但连接模式的多样化从根本上削 弱了残差连接固有的恒等映射特性,导致严重的训练不稳定性与受 ...
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]