Artificial Intelligence
Search documents
刚刚,DeepSeek 扔出大杀器,梁文锋署名!暴力优化 AI 架构
程序员的那些事· 2026-01-01 13:15
Core Insights - DeepSeek introduced a new architecture called "Manifold-Constrained Hyper-Connections" (mHC), which enhances performance with only a 6.7% increase in training time on a 27 billion parameter model [3][36]. - The mHC architecture optimizes the residual connection space by projecting matrices onto constrained manifolds, ensuring stability and significantly expanding the residual stream width without substantial computational costs [8][25]. Group 1: Performance Improvements - In system-level benchmark tests, the mHC architecture consistently outperformed baseline models and Hyper-Connections (HC) across various tasks, demonstrating its effectiveness in large-scale pre-training [22][51]. - Specific performance metrics showed that mHC achieved a 2.1% improvement on the BBH benchmark and a 2.3% improvement on the DROP benchmark compared to HC [52][54]. Group 2: Technical Details - The core idea of mHC is to restore identity mapping properties under the topology of Hyper-Connections, allowing for practical value in large-scale training and real-world foundational model tasks [25]. - mHC employs a double stochastic matrix constraint to maintain stability while enhancing the interaction between residual streams, which is crucial for maximizing the potential of multi-stream architectures [26][27]. Group 3: Engineering Optimizations - The implementation of mHC involved several engineering optimizations, including reordering operations to improve efficiency and using mixed precision strategies to maximize numerical accuracy without sacrificing computational speed [38][42]. - The DualPipe scheduling strategy was enhanced to effectively overlap communication and computation, addressing significant communication delays introduced by the n-stream residual structure [46][48].
AI进化速递丨DeepSeek提出mHC新架构
Di Yi Cai Jing· 2026-01-01 13:05
DeepSeek发布新论文,提出mHC(流形约束超连接)新架构。 ①智元发布一体化具身大小脑系统GenieReasoner; ②月之暗面计划今年初上线多模态新模型; ③DeepSeek发布新论文,提出mHC(流形约束超连接)新架构。 ...
成立不到3年,被Meta以逾20亿美元收购!Manus成为“AI时代中国创业新标杆”
Sou Hu Cai Jing· 2026-01-01 12:27
Meta以数十亿美元收购AI初创公司Manus,这是这家社交媒体巨头成立以来第三大收购交易,仅次于 WhatsApp和Scale AI。这笔交易标志着Meta在AI领域的激进投资策略进入新阶段,也为中国创业者在全 球AI竞赛中树立了新标杆。 据华尔街日报最新报道,Meta正以逾20亿美元收购Manus。周二,据《晚点LatePost》,Manus母公司 蝴蝶效应在被收购前正以20亿美元估值进行新一轮融资。整个收购谈判在极短时间内完成,前后不过十 余天。收购完成后,蝴蝶效应将保持独立运作,创始人肖弘将出任Meta副总裁。 随后,Meta首席人工智能官Alexandr Wang发推文称,欢迎Manus AI的加入。 CEO Red 肖弘也发声称,"这不仅仅是一次收购。它验证了我们一直以来努力构建的未来是真实存在 的,而且它的到来比任何人预期的都要快。" 公司第一款产品是浏览器AI插件Monica,提供大模型驱动的聊天、搜索、阅读、写作、翻译等功能。 尽管当时被指为"套壳"产品,但Monica成为中国AI行业少有的盈利产品。 2024年,90后连续创业者季逸超和产品经理张涛加入团队,共同开发出Manus。这款能够调 ...
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
华尔街见闻· 2026-01-01 12:20
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) to address the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][6][8]. Group 1: mHC Architecture - The mHC architecture extends the single residual flow of traditional Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][8]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving training instability and excessive memory consumption [8][9]. - Empirical evidence shows that mHC not only addresses stability issues but also demonstrates exceptional scalability in large-scale training, such as with a 27 billion parameter model, where it only increased training time by 6.7% while achieving significant performance improvements [8][32]. Group 2: Challenges with Traditional Hyper-Connections - Traditional hyper-connections (HC) have led to severe training instability and limited scalability due to the fundamental disruption of the inherent identity mapping property, which is crucial for stable training [5][9]. - The widening of information channels in HC results in increased memory access overhead, contributing to what is known as the "memory wall" problem [9][5]. Group 3: Implementation and Efficiency - DeepSeek has designed a tailored infrastructure for mHC, which includes kernel fusion, selective recomputation, and an extended DualPipe communication overlap strategy to minimize memory usage and enhance efficiency [23][25][27]. - The Sinkhorn-Knopp algorithm is employed to ensure that the residual connection matrix remains stable and adheres to the properties of a doubly stochastic matrix, which helps mitigate gradient explosion issues [16][21]. Group 4: Experimental Validation - The research team conducted experiments using language model pre-training to validate the effectiveness of mHC, comparing it against baseline models and traditional HC [28][32]. - Results from various downstream benchmark tests indicate that mHC consistently outperforms baseline models and often surpasses HC, demonstrating its effectiveness in large-scale pre-training [34][33]. - The scalability experiments reveal that mHC maintains performance advantages even at higher computational budgets, showing only slight degradation in performance [36][37].
The biggest startups raised a record amount in 2025, dominated by AI
Yahoo Finance· 2026-01-01 11:00
Core Insights - The excitement surrounding artificial intelligence has led to a record year of fundraising for AI companies in 2025, with a total of $150 billion raised, surpassing the previous high of $92 billion in 2021 [2] Group 1: Funding and Investment - The largest private U.S. companies raised a record $150 billion in 2025, with significant allocations to major AI firms like OpenAI and Anthropic [2] - OpenAI raised $40 billion, marking the largest private funding round in history, while Anthropic secured $13 billion and Elon Musk's xAI raised $10 billion [3] - Several other AI companies, including Jeff Bezos' Project Prometheus and Databricks, exceeded the $2 billion funding threshold during the year [5] Group 2: Market Dynamics - The concentration of capital in a few large AI companies raises concerns about long-term systemic risks in the venture capital market, as highlighted by PitchBook analyst Kyle Stanford [4] - The top four funding deals accounted for over 30% of the total deal value, indicating a trend towards larger investments in fewer companies [3] Group 3: Future Projections - Big Tech companies are projected to invest more than $500 billion in 2026 to develop AI infrastructure, including networks and data centers [8] - The promise of AI in 2026 hinges on the broader adoption of "AI agents" that can autonomously perform tasks, which is expected to significantly impact the economy [7] Group 4: Public Market Impact - The AI hype has influenced the public market, with nine of the top ten most valuable companies being tech firms benefiting from AI advancements, collectively valued at over $3 trillion [6]
苏州:打造最具创新气质的跨年夜
Sou Hu Cai Jing· 2026-01-01 10:39
Core Insights - Suzhou is positioning itself as a leading city for One Person Companies (OPC) and innovation, hosting the "OPC Suzhou Night" event to foster entrepreneurship among youth and AI entrepreneurs [1][3][13] - The event showcased 36 high-quality projects from top universities, emphasizing Suzhou's commitment to building an ecosystem for AI and entrepreneurship [5][12] Group 1: Event Overview - The "OPC Suzhou Night" gathered over 1,500 participants, including entrepreneurs, students, and investors, to celebrate innovation and entrepreneurship [1][3] - The event featured project roadshows and discussions on entrepreneurship, providing a platform for young talents to connect with resources and opportunities [1][5] Group 2: OPC Concept and Development - OPC stands for One Person Company, a model where individuals leverage AI tools for tasks like code generation and content creation, focusing on strategic decision-making [3][5] - Suzhou aims to become the "OPC Entrepreneurial Preferred City," with plans to establish over 50 OPC communities and support the growth of 1,000 OPC enterprises by 2028 [5][12] Group 3: Project Showcases - The event included 36 projects from various fields such as intelligent robotics and AI applications, with 19 projects presented live [5][6] - Notable companies, referred to as Suzhou's "Ten Little Tigers," shared their entrepreneurial journeys and contributions to the OPC model [6][10] Group 4: Talent and Collaboration Initiatives - The "Hundred Schools Thousand Enterprises" alliance was formed to enhance collaboration between universities and businesses, focusing on student employment and technology transfer [13][15] - The alliance includes 119 key universities and over 1,000 quality enterprises, aiming to integrate talent, innovation, and industry needs [15][16] Group 5: Global Outreach - Suzhou is actively inviting global youth to participate in its entrepreneurial ecosystem, showcasing its vibrant AI industry and investment opportunities [13][16] - The city organized visits for over 200 international students to explore its AI industry clusters, facilitating connections with local enterprises and investment institutions [15][16]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
量子位· 2026-01-01 10:32
Core Viewpoint - The article discusses the evolution and enhancement of the residual connection, a fundamental component in deep learning introduced by He Kaiming in ResNet, and presents a new approach called Hyper-Connections (HC) that aims to improve performance while addressing potential issues related to signal amplification and stability in deep learning architectures [2][7][11]. Group 1: Residual Connections and Their Evolution - Residual connections have been a cornerstone of deep learning since the introduction of ResNet in 2016, allowing signals to pass directly from shallow to deep layers without modification [7][9]. - The rise of Transformer architectures has made residual connections a standard feature in large language models like GPT and LLaMA [10]. - Hyper-Connections (HC) expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [11]. Group 2: Performance and Stability Challenges - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange in HC, significantly enhances performance [12]. - However, when HC is extended to multiple layers, the composite mapping loses its identity property, leading to potential issues such as sudden loss spikes and gradient fluctuations during training [14]. - The peak amplification factor of signals in HC can reach 3000, which poses risks of signal distortion during inter-layer propagation [16]. Group 3: Theoretical Framework and Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double stochastic matrices, which ensures three key theoretical properties: norm preservation, combinatorial closure, and geometric interpretation [17][19]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, effectively reducing the signal amplification issue observed in HC [21]. Group 4: Engineering Optimizations - The paper details the memory access costs associated with expanding the residual flow width, highlighting significant increases in read and write operations for HC compared to standard residual connections [24]. - To mitigate these costs, the team developed infrastructure optimizations, including the TileLang framework for merging operations and specialized kernels for the Sinkhorn-Knopp algorithm [25][26]. - The paper also discusses pipeline parallelism enhancements to overlap computation and communication, improving overall efficiency [27]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4 [30]. - In the 27B MoE model, the modified HC (mHC) demonstrated a stable training curve, achieving a loss reduction of 0.021 compared to the baseline while maintaining gradient stability [31]. - Performance improvements were noted in downstream tasks, with mHC outperforming both the baseline and HC in various benchmarks [32][35].
扎克伯格出手!超20亿美元拿下中国纯AI应用公司,给创业者打强心针
创业家· 2026-01-01 10:07
Core Viewpoint - Meta's acquisition of Manus for over $2 billion highlights the significant value of pure AI applications, challenging the notion that innovation must stem from hardware-software integration [3][4][6][8]. Group 1: Acquisition Insights - Manus, a company established only three years ago, achieved a remarkable acquisition price, making it a focal point in the domestic AI startup community [4]. - The acquisition demonstrates that global giants like Meta are willing to pay high prices for innovative applications, even when they come from smaller, purely software-based companies [4][5]. - This event serves as a strong encouragement for Chinese entrepreneurs, indicating that substantial value can be derived from pure AI applications without the need for hardware [4][5][6]. Group 2: Industry Perspectives - Industry experts emphasize the importance of focusing on application innovation and targeting global markets, rather than being constrained by traditional models that prioritize hardware [5][7]. - The success of Manus illustrates that a Chinese team can create a product with global appeal, emphasizing the significance of talent and product strength in the AI sector [5][9]. - The acquisition sets a clear direction for future Chinese AI startups, suggesting that they should concentrate on application innovation and global market strategies by 2026 [9].