Workflow
Artificial Intelligence
icon
Search documents
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
华尔街见闻· 2026-01-01 12:20
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) to address the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][6][8]. Group 1: mHC Architecture - The mHC architecture extends the single residual flow of traditional Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][8]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving training instability and excessive memory consumption [8][9]. - Empirical evidence shows that mHC not only addresses stability issues but also demonstrates exceptional scalability in large-scale training, such as with a 27 billion parameter model, where it only increased training time by 6.7% while achieving significant performance improvements [8][32]. Group 2: Challenges with Traditional Hyper-Connections - Traditional hyper-connections (HC) have led to severe training instability and limited scalability due to the fundamental disruption of the inherent identity mapping property, which is crucial for stable training [5][9]. - The widening of information channels in HC results in increased memory access overhead, contributing to what is known as the "memory wall" problem [9][5]. Group 3: Implementation and Efficiency - DeepSeek has designed a tailored infrastructure for mHC, which includes kernel fusion, selective recomputation, and an extended DualPipe communication overlap strategy to minimize memory usage and enhance efficiency [23][25][27]. - The Sinkhorn-Knopp algorithm is employed to ensure that the residual connection matrix remains stable and adheres to the properties of a doubly stochastic matrix, which helps mitigate gradient explosion issues [16][21]. Group 4: Experimental Validation - The research team conducted experiments using language model pre-training to validate the effectiveness of mHC, comparing it against baseline models and traditional HC [28][32]. - Results from various downstream benchmark tests indicate that mHC consistently outperforms baseline models and often surpasses HC, demonstrating its effectiveness in large-scale pre-training [34][33]. - The scalability experiments reveal that mHC maintains performance advantages even at higher computational budgets, showing only slight degradation in performance [36][37].
The biggest startups raised a record amount in 2025, dominated by AI
Yahoo Finance· 2026-01-01 11:00
Core Insights - The excitement surrounding artificial intelligence has led to a record year of fundraising for AI companies in 2025, with a total of $150 billion raised, surpassing the previous high of $92 billion in 2021 [2] Group 1: Funding and Investment - The largest private U.S. companies raised a record $150 billion in 2025, with significant allocations to major AI firms like OpenAI and Anthropic [2] - OpenAI raised $40 billion, marking the largest private funding round in history, while Anthropic secured $13 billion and Elon Musk's xAI raised $10 billion [3] - Several other AI companies, including Jeff Bezos' Project Prometheus and Databricks, exceeded the $2 billion funding threshold during the year [5] Group 2: Market Dynamics - The concentration of capital in a few large AI companies raises concerns about long-term systemic risks in the venture capital market, as highlighted by PitchBook analyst Kyle Stanford [4] - The top four funding deals accounted for over 30% of the total deal value, indicating a trend towards larger investments in fewer companies [3] Group 3: Future Projections - Big Tech companies are projected to invest more than $500 billion in 2026 to develop AI infrastructure, including networks and data centers [8] - The promise of AI in 2026 hinges on the broader adoption of "AI agents" that can autonomously perform tasks, which is expected to significantly impact the economy [7] Group 4: Public Market Impact - The AI hype has influenced the public market, with nine of the top ten most valuable companies being tech firms benefiting from AI advancements, collectively valued at over $3 trillion [6]
苏州:打造最具创新气质的跨年夜
Sou Hu Cai Jing· 2026-01-01 10:39
在AI时代,OPC(One Person Company)不仅指代"一人公司",更是指个人借助AI工具承担代码生成、内容创作等标准化任务,并专注于战略决策与创意 设计的创业模式。当前,长三角多地纷纷打造OPC社区,为AI创业者提供创业沃土。 2025年的跨年夜,"最强地级市"苏州极具创新气质。当晚,1500多名创业者、OPC注册者、在校大学生、科研人员、投资者相聚苏州国际会议中心,举 办"OPC苏州之夜"活动。活动充满科技的律动、创新的节奏,通过讲述创业故事、分享创业经验和现场项目路演,共同以青春智慧,迎接新年,奔赴未 来。 "苏州的陪伴,将是你最坚实的后盾。""苏州,一座你永远值得相信的城市。""苏企共进,一起创业,共创未来。" 2025年的跨年夜,"最强地级市"苏州极具创新气质。当晚,1500多名创业者、OPC注册者、在校大学生、科研人员、投资者相聚苏州国际会议中心,举 办"OPC苏州之夜"活动。活动充满科技的律动、创新的节奏,通过讲述创业故事、分享创业经验和现场项目路演,共同以青春智慧,迎接新年,奔赴未 来。 活动不仅为广大有志青年学子提供了展示梦想、对接资源、融入生态的开放平台,更为处在就业与创业十字 ...
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
量子位· 2026-01-01 10:32
Core Viewpoint - The article discusses the evolution and enhancement of the residual connection, a fundamental component in deep learning introduced by He Kaiming in ResNet, and presents a new approach called Hyper-Connections (HC) that aims to improve performance while addressing potential issues related to signal amplification and stability in deep learning architectures [2][7][11]. Group 1: Residual Connections and Their Evolution - Residual connections have been a cornerstone of deep learning since the introduction of ResNet in 2016, allowing signals to pass directly from shallow to deep layers without modification [7][9]. - The rise of Transformer architectures has made residual connections a standard feature in large language models like GPT and LLaMA [10]. - Hyper-Connections (HC) expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [11]. Group 2: Performance and Stability Challenges - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange in HC, significantly enhances performance [12]. - However, when HC is extended to multiple layers, the composite mapping loses its identity property, leading to potential issues such as sudden loss spikes and gradient fluctuations during training [14]. - The peak amplification factor of signals in HC can reach 3000, which poses risks of signal distortion during inter-layer propagation [16]. Group 3: Theoretical Framework and Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double stochastic matrices, which ensures three key theoretical properties: norm preservation, combinatorial closure, and geometric interpretation [17][19]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, effectively reducing the signal amplification issue observed in HC [21]. Group 4: Engineering Optimizations - The paper details the memory access costs associated with expanding the residual flow width, highlighting significant increases in read and write operations for HC compared to standard residual connections [24]. - To mitigate these costs, the team developed infrastructure optimizations, including the TileLang framework for merging operations and specialized kernels for the Sinkhorn-Knopp algorithm [25][26]. - The paper also discusses pipeline parallelism enhancements to overlap computation and communication, improving overall efficiency [27]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4 [30]. - In the 27B MoE model, the modified HC (mHC) demonstrated a stable training curve, achieving a loss reduction of 0.021 compared to the baseline while maintaining gradient stability [31]. - Performance improvements were noted in downstream tasks, with mHC outperforming both the baseline and HC in various benchmarks [32][35].
扎克伯格出手!超20亿美元拿下中国纯AI应用公司,给创业者打强心针
创业家· 2026-01-01 10:07
Core Viewpoint - Meta's acquisition of Manus for over $2 billion highlights the significant value of pure AI applications, challenging the notion that innovation must stem from hardware-software integration [3][4][6][8]. Group 1: Acquisition Insights - Manus, a company established only three years ago, achieved a remarkable acquisition price, making it a focal point in the domestic AI startup community [4]. - The acquisition demonstrates that global giants like Meta are willing to pay high prices for innovative applications, even when they come from smaller, purely software-based companies [4][5]. - This event serves as a strong encouragement for Chinese entrepreneurs, indicating that substantial value can be derived from pure AI applications without the need for hardware [4][5][6]. Group 2: Industry Perspectives - Industry experts emphasize the importance of focusing on application innovation and targeting global markets, rather than being constrained by traditional models that prioritize hardware [5][7]. - The success of Manus illustrates that a Chinese team can create a product with global appeal, emphasizing the significance of talent and product strength in the AI sector [5][9]. - The acquisition sets a clear direction for future Chinese AI startups, suggesting that they should concentrate on application innovation and global market strategies by 2026 [9].
Kimi账上100亿,不着急上市
盐财经· 2026-01-01 09:42
以下文章来源于投资界 ,作者周佳丽 吴琼 2025年最后一天,依然令人震撼。 投资界获悉,月之暗面(Kimi)已完成5亿美元C轮融资,且大幅超募——阿里、腾讯、王慧文等老股东 均追加投资。 本轮融资后,月之暗面投后估值跃升至43亿美元(约合人民币300亿元)。 今天(12月31日),月之暗面创始人、CEO杨植麟在内部信中透露,当前现金储备已超过100亿元。这 一资金规模,几乎是即将IPO的智谱与MiniMax的两家之和。 投资界 . 清科控股旗下创业与投资资讯平台 作本文转载自财经天下weekly 值班编辑| 江江 视觉 | 顾芗 Kimi创纪录 刚刚融资5亿美元 印象中,这是月之暗面罕见的一次融资官宣。 在此之前,创投圈见证了月之暗面的融资速度,身后集结红杉中国、真格基金、Monolith砺思资本、今 日资本、源码资本等知名基金以及阿里、美团、小红书等大厂,估值也是螺旋式上升,早已挺进30亿美 元大关。 今天,杨植麟在内部信中确认:公司近期完成了5亿美元C轮融资。投资界了解到,此次C轮融资中,阿 里、腾讯、王慧文等老股东均超额认购,投后估值达43亿美元(约合人民币300亿元)。 回首过去一年,国产大模型江湖 ...
China's Moonshot AI raises US$500 million in latest funding round: report
Yahoo Finance· 2026-01-01 09:30
Chinese artificial intelligence unicorn Moonshot AI has raised US$500 million in its recent Series C funding round, according to a report, as start-up rivals MiniMax Group and Zhipu AI gear up for their initial public offerings (IPOs). Moonshot AI, developer of the highly lauded Kimi AI models, saw IDG Capital lay out US$150 million to lead the latest financing round, with existing stakeholders Alibaba Group Holding and Tencent Holdings also participating, Chinese technology news outlet LatePost reported ...
DeepSeek元旦发布新论文 开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 09:28
格隆汇1月1日|DeepSeek在元旦发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。 该研究旨在解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益 。这篇 论文的第一作者有三位:Zhenda Xie(解振达)、Yixuan Wei(韦毅轩)、Huanqi Cao。值得注意的是, DeepSeek创始人&CEO梁文锋也在作者名单中。 ...