Workflow
mHC(流形约束超连接)
icon
Search documents
计算机行业周报:小红书Video-Thinker打破工具依赖,DeepSeek推出mHC-20260106
Huaxin Securities· 2026-01-06 12:34
Investment Rating - The report maintains a "Buy" rating for several companies in the AI and computing sectors, including Weike Technology (301196.SZ), Nengke Technology (603859.SH), Hehe Information (688615.SH), and Maixinlin (688685.SH) [9]. Core Insights - The report highlights the introduction of the Video-Thinker model by Xiaohongshu, which breaks the dependency on external tools for video reasoning, achieving state-of-the-art (SOTA) performance with a 7B parameter version [3][22]. - DeepSeek's new architecture, mHC, shows significant performance improvements with only a 6.7% increase in training time, marking a breakthrough in model efficiency [31][32]. - Kimi, a Chinese AI startup, completed a $500 million Series C funding round, with a post-money valuation of $4.3 billion, focusing on the development of its K3 model and talent incentives for 2026 [4][44]. Summary by Sections 1. Computing Dynamics - The report notes stable pricing in computing power leasing, with specific rates for various configurations [21]. - Xiaohongshu's Video-Thinker model integrates key capabilities such as temporal grounding and visual description, achieving new benchmarks in video reasoning [22][23]. - The model's training paradigm includes a two-stage process that enhances its reasoning capabilities while reducing reliance on external tools [26][27]. 2. AI Application Dynamics - Character.AI experienced an 8.32% increase in weekly traffic, indicating growing interest in AI applications [30]. - DeepSeek's mHC architecture addresses traditional bottlenecks in model efficiency, providing a robust framework for enhancing model capabilities [31][32]. 3. AI Financing Trends - Kimi's recent funding round will support the development of its K3 model and expansion of its talent pool, following significant technological advancements in 2025 [4][44]. - Meta's acquisition of Manus for $4-5 billion underscores the strategic importance of AI applications and the integration of advanced AI capabilities into its ecosystem [5][6]. 4. Market Performance - The report provides comparative performance metrics for various AI models, showcasing the advancements made by Video-Thinker over existing solutions [28][29]. - The overall market sentiment remains positive, with a focus on the long-term growth potential of AI applications and computing technologies [7].
梁文锋DeepSeek新论文!接棒何恺明和字节,又稳了稳AI的“地基”
Xin Lang Cai Jing· 2026-01-02 05:27
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyper-Connections), which significantly improves the residual connection component of the Transformer architecture, a foundational element that has seen little change since its inception in 2015 [1][3] Group 1: Historical Context - The evolution of neural network architectures began with ResNet, introduced by Kaiming He in 2015, which addressed the vanishing gradient problem and enabled the training of very deep networks [3] - The Transformer model, released in 2017, adopted residual connections as a standard feature, forming the basis for many leading models today [3] Group 2: Technical Comparisons - Hyper-Connections, proposed by ByteDance in 2024, expanded the single residual flow into multiple parallel streams, enhancing model performance but introducing stability issues during training [5][10] - mHC aims to resolve the stability problems associated with Hyper-Connections by constraining the connection weight matrix within a specific mathematical space, ensuring that signal amplification does not occur [10][12] Group 3: Mathematical Innovation - The core innovation of mHC involves using a Doubly Stochastic Matrix for the connection weights, which guarantees that the output does not exceed the maximum input value, thus preserving energy conservation [10][12] - The implementation of mHC utilizes the Sinkhorn-Knopp algorithm to achieve the desired matrix properties efficiently, allowing for end-to-end training without introducing new hyperparameters [11][12] Group 4: Engineering Excellence - DeepSeek's approach to implementing mHC demonstrates significant engineering capabilities, including the development of custom CUDA kernels and operator fusion techniques to minimize computational delays [16] - The ability to integrate innovative mathematical solutions into practical training environments highlights DeepSeek's competitive advantage in the AI research landscape [16]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
华尔街见闻· 2026-01-01 12:20
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) to address the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][6][8]. Group 1: mHC Architecture - The mHC architecture extends the single residual flow of traditional Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][8]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving training instability and excessive memory consumption [8][9]. - Empirical evidence shows that mHC not only addresses stability issues but also demonstrates exceptional scalability in large-scale training, such as with a 27 billion parameter model, where it only increased training time by 6.7% while achieving significant performance improvements [8][32]. Group 2: Challenges with Traditional Hyper-Connections - Traditional hyper-connections (HC) have led to severe training instability and limited scalability due to the fundamental disruption of the inherent identity mapping property, which is crucial for stable training [5][9]. - The widening of information channels in HC results in increased memory access overhead, contributing to what is known as the "memory wall" problem [9][5]. Group 3: Implementation and Efficiency - DeepSeek has designed a tailored infrastructure for mHC, which includes kernel fusion, selective recomputation, and an extended DualPipe communication overlap strategy to minimize memory usage and enhance efficiency [23][25][27]. - The Sinkhorn-Knopp algorithm is employed to ensure that the residual connection matrix remains stable and adheres to the properties of a doubly stochastic matrix, which helps mitigate gradient explosion issues [16][21]. Group 4: Experimental Validation - The research team conducted experiments using language model pre-training to validate the effectiveness of mHC, comparing it against baseline models and traditional HC [28][32]. - Results from various downstream benchmark tests indicate that mHC consistently outperforms baseline models and often surpasses HC, demonstrating its effectiveness in large-scale pre-training [34][33]. - The scalability experiments reveal that mHC maintains performance advantages even at higher computational budgets, showing only slight degradation in performance [36][37].
DeepSeek,最新发布!
Zheng Quan Shi Bao· 2026-01-01 10:56
Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].