Workflow
mHC
icon
Search documents
技术与资本共振,国产大模型护航AI应用浪潮
China Post Securities· 2026-01-05 11:14
Industry Investment Rating - The industry investment rating is "Outperform the Market" and is maintained [2] Core Insights - The report highlights that the domestic large model industry has transitioned from a technology catch-up phase to a new stage of systematic layout and ecological construction, with breakthroughs in algorithms, collaborative computing power, data accumulation, capital support, and policy backing [9] - The mHC architecture proposed by DeepSeek addresses three major pain points in large model training, significantly lowering the training threshold and costs while enhancing performance and efficiency [6][7] - The report indicates a robust growth in the application ecosystem, with notable user engagement in AI applications, reflecting strong market demand for quality AI application targets [8] Summary by Relevant Sections Industry Overview - The closing index is at 5211.26, with a 52-week high of 5841.52 and a low of 3963.29 [2] Performance Analysis - The relative performance of the computer industry shows a positive trend, with a notable increase compared to the CSI 300 index [4] Recent Developments - Companies like Zhizhu and MiniMax are making significant strides towards IPOs, while Kimi has completed a $500 million Series C financing, indicating a strong capital influx into the industry [7] - The report notes that Kimi's user base has seen a month-over-month growth of over 170% in paid users from September to November 2025 [7] Investment Recommendations - The report suggests focusing on various sectors, including Hong Kong internet companies and domestic computing power firms, highlighting specific companies such as Alibaba, Tencent, and Cambricon [9]
DeepSeek上新mHC,R2还远吗?
Tai Mei Ti A P P· 2026-01-04 06:05
去年1月,春节前夕,"DeepSeek冲击波"席卷业界,中美同时"破圈",成为年度现象级事件。而2026年 一开年,DeepSeek又惊喜时刻进一步提前。 1月1日,DeepSeek在AI开源社区HuggingFacear和研究分享平台arXiv发布论文,提出了名为mHC (Manifold-Constrained Hyper-Connections)的新型神经网络架构优化方案,再次引发讨论热潮,其对 AI产业,包括大模型、芯片等领域可能产生的影响也备受瞩目。 图片来自DeepSeek论文"mHC:Manifold-Constrained Hyper-Connections" mHC架构让大模型训练更稳、更快、更省 DeepSeek此次提出的mHC架构,建立在字节豆包大模型Foundation团队2024年11月发布的Hyper- Connections(HC)基础上。 彼时,豆包团队宣称HC有望替代大模型开发领域沿用近10年的ResNet残差神经网络架构,通过拓宽残 差连接宽度,增加连接模式多样性,提升大模型性能和灵活性。 不过,HC只在理论推演和小模型实验中取得了成绩,在大模型训练中,残差连接通道间的交互 ...
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
量子位· 2026-01-01 10:32
Core Viewpoint - The article discusses the evolution and enhancement of the residual connection, a fundamental component in deep learning introduced by He Kaiming in ResNet, and presents a new approach called Hyper-Connections (HC) that aims to improve performance while addressing potential issues related to signal amplification and stability in deep learning architectures [2][7][11]. Group 1: Residual Connections and Their Evolution - Residual connections have been a cornerstone of deep learning since the introduction of ResNet in 2016, allowing signals to pass directly from shallow to deep layers without modification [7][9]. - The rise of Transformer architectures has made residual connections a standard feature in large language models like GPT and LLaMA [10]. - Hyper-Connections (HC) expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [11]. Group 2: Performance and Stability Challenges - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange in HC, significantly enhances performance [12]. - However, when HC is extended to multiple layers, the composite mapping loses its identity property, leading to potential issues such as sudden loss spikes and gradient fluctuations during training [14]. - The peak amplification factor of signals in HC can reach 3000, which poses risks of signal distortion during inter-layer propagation [16]. Group 3: Theoretical Framework and Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double stochastic matrices, which ensures three key theoretical properties: norm preservation, combinatorial closure, and geometric interpretation [17][19]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, effectively reducing the signal amplification issue observed in HC [21]. Group 4: Engineering Optimizations - The paper details the memory access costs associated with expanding the residual flow width, highlighting significant increases in read and write operations for HC compared to standard residual connections [24]. - To mitigate these costs, the team developed infrastructure optimizations, including the TileLang framework for merging operations and specialized kernels for the Sinkhorn-Knopp algorithm [25][26]. - The paper also discusses pipeline parallelism enhancements to overlap computation and communication, improving overall efficiency [27]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4 [30]. - In the 27B MoE model, the modified HC (mHC) demonstrated a stable training curve, achieving a loss reduction of 0.021 compared to the baseline while maintaining gradient stability [31]. - Performance improvements were noted in downstream tasks, with mHC outperforming both the baseline and HC in various benchmarks [32][35].