mHC(流形约束超连接)架构
Search documents
量化圈重磅!百亿私募“开年大动作”,开源发布全新代码大模型!
Xin Lang Cai Jing· 2026-01-02 04:03
Core Insights - The quant private equity sector is witnessing significant advancements in AI technology, with firms like Jiukun Investment launching new initiatives and models to enhance their capabilities in software engineering and competitive programming [1][3] - The establishment of the Zhizhi Innovation Research Institute by Jiukun Investment marks a strategic move to accelerate AI application in various fields, focusing on original contributions to cutting-edge AI research [2][3] - The trend of quant firms forming AI labs and research institutes is accelerating, indicating a shift towards deeper integration of AI technologies in investment strategies and operations [3][5] Group 1: New Developments in AI Models - Jiukun Investment announced the open-source release of the IQuest-Coder-V1 series, a code intelligence model that excels in tasks such as automatic programming and bug fixing, positioning itself among the leading open-source code models [1] - DeepSeek introduced a new architecture called mHC, aimed at addressing instability issues in large-scale model training while maintaining performance gains, further igniting the competitive landscape in AI [1] Group 2: Research and Development Focus - The Zhizhi Innovation Research Institute has produced high-quality work in areas such as large language models and AI applications in healthcare, with notable recognition at the 2025 NeurIPS conference [2] - The institute aims to leverage the complex financial scenarios faced by quantitative investment to enhance AI's practical applications, emphasizing the need for extreme performance in engineering and data capabilities [2] Group 3: Industry Trends and Shifts - Since the emergence of DeepSeek, many quant firms have established AI labs, indicating a rapid increase in investment and focus on AI technologies within the quant sector [3] - The core competitive advantage in the quant industry is shifting from capital size to the speed of model and algorithm iteration, suggesting a deeper competition akin to that in the tech sector [5] - The new AI initiatives are characterized by a foundational research approach, increased openness in collaboration, and applications extending beyond traditional financial markets [5]
DeepSeek新年炸场!梁文锋署名论文发布
Di Yi Cai Jing· 2026-01-01 13:44
Core Viewpoint - DeepSeek has introduced a new network architecture called mHC (Manifold-Constrained Hyper-Connections) aimed at addressing instability issues in large-scale model training, potentially guiding the evolution of next-generation infrastructure [1][3][4]. Group 1: Technical Innovations - The mHC architecture improves upon traditional hyper-connection frameworks by balancing performance and efficiency, akin to adding "traffic rules" to information channels, ensuring stable information flow during model training [4]. - The research highlights that mHC can enhance the stability and scalability of large models, making it easier to implement in complex scenarios, such as multi-modal models and industrial decision-making systems [5]. Group 2: Industry Implications - mHC may reduce hardware investment and training time for companies developing larger foundational models, thus lowering the barriers for small and medium AI enterprises to create more complex models [5]. - The innovation is seen as a fundamental advancement in addressing core issues within the Transformer architecture, with expectations for significant updates in DeepSeek's upcoming V4 version [5]. Group 3: Recent Developments - Despite not launching major versions like R2 or V4 in 2023, DeepSeek has continued to innovate, releasing DeepSeek-V3.2 and DeepSeek-Math-V2, the latter being the first math model to reach international Olympiad gold medal standards [6].
今日财经要闻TOP10|2026年1月1日
Xin Lang Cai Jing· 2026-01-01 12:33
Group 1 - DeepSeek released a new paper on New Year's Day proposing a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1] - The paper's first authors include Zhenda Xie, Yixuan Wei, and Huanqi Cao, with DeepSeek's founder and CEO Liang Wenfeng also listed as an author [1] Group 2 - The EU's Carbon Border Adjustment Mechanism (CBAM) will officially implement on January 1, 2026, with recent legislative proposals and implementation details released by the EU [2] - China expressed concerns over the EU's high default carbon emission intensity values for Chinese products, which are deemed unfair and discriminatory, and plans to gradually increase these values over the next three years [2] - The EU plans to expand the CBAM scope to include approximately 180 steel and aluminum-intensive downstream products by 2028, which China views as unilateral and protectionist [2] Group 3 - Multiple electric vehicle manufacturers have reported their delivery data for December 2025 and the entire year, with Li Auto delivering 44,246 vehicles in December and a total of 1,540,215 vehicles since inception [6][16] - NIO delivered 48,135 vehicles in December, marking a 54.6% year-on-year increase, and a total of 326,028 vehicles for the year, a 46.9% increase [6][16] - Xpeng Motors reported 37,508 vehicles delivered in December, with a total of 429,445 vehicles for the year, reflecting a 126% year-on-year growth [6][16] Group 4 - Warren Buffett officially retired as CEO of Berkshire Hathaway on December 31, 2025 [7][18]
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
机器之心· 2026-01-01 08:22
Core Viewpoint - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) to address the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][3][4]. Group 1: Introduction of mHC - The mHC framework extends the traditional Transformer’s single residual flow into a multi-flow parallel architecture, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][4]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while addressing training instability and excessive memory consumption [4][6]. Group 2: Challenges with Traditional Hyper-Connections - Traditional residual connections ensure stable signal transmission through identity mapping, but they face limitations due to the restricted width of information channels [3][6]. - Recent methods like Hyper-Connections (HC) have improved performance but introduced significant training instability and increased memory access overhead [3][6]. Group 3: Methodology of mHC - mHC projects the residual connection space onto a specific manifold to restore the identity mapping property while optimizing infrastructure for efficiency [4][9]. - The use of the Sinkhorn-Knopp algorithm allows the connection matrix to be projected onto the Birkhoff polytope, ensuring stability in signal propagation [4][10]. Group 4: Experimental Validation - Empirical results show that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training, such as with a 27 billion parameter model, increasing training time by only 6.7% while achieving significant performance improvements [4][29]. - In benchmark tests, mHC consistently outperformed baseline models and HC in various downstream tasks, indicating its effectiveness in large-scale pre-training [30][31]. Group 5: Infrastructure Design - DeepSeek has tailored infrastructure for mHC, including kernel fusion, selective recomputation, and enhanced communication strategies to minimize memory overhead and improve computational efficiency [17][21][23]. - The design choices, such as optimizing the order of operations and implementing mixed precision strategies, contribute to the overall efficiency of mHC [17][18].