Workflow
DeepSeek
icon
Search documents
今日财经要闻TOP10|2026年1月1日
Xin Lang Cai Jing· 2026-01-01 12:33
Group 1 - DeepSeek released a new paper on New Year's Day proposing a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1] - The paper's first authors include Zhenda Xie, Yixuan Wei, and Huanqi Cao, with DeepSeek's founder and CEO Liang Wenfeng also listed as an author [1] Group 2 - The EU's Carbon Border Adjustment Mechanism (CBAM) will officially implement on January 1, 2026, with recent legislative proposals and implementation details released by the EU [2] - China expressed concerns over the EU's high default carbon emission intensity values for Chinese products, which are deemed unfair and discriminatory, and plans to gradually increase these values over the next three years [2] - The EU plans to expand the CBAM scope to include approximately 180 steel and aluminum-intensive downstream products by 2028, which China views as unilateral and protectionist [2] Group 3 - Multiple electric vehicle manufacturers have reported their delivery data for December 2025 and the entire year, with Li Auto delivering 44,246 vehicles in December and a total of 1,540,215 vehicles since inception [6][16] - NIO delivered 48,135 vehicles in December, marking a 54.6% year-on-year increase, and a total of 326,028 vehicles for the year, a 46.9% increase [6][16] - Xpeng Motors reported 37,508 vehicles delivered in December, with a total of 429,445 vehicles for the year, reflecting a 126% year-on-year growth [6][16] Group 4 - Warren Buffett officially retired as CEO of Berkshire Hathaway on December 31, 2025 [7][18]
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
“股票盛世”!全球股市连续第3年“两位数上涨”
华尔街见闻· 2026-01-01 12:20
Core Viewpoint - The global stock market is expected to achieve double-digit growth for the third consecutive year in 2025, despite uncertainties from Trump's trade policies and concerns over AI sector bubbles. The MSCI global index has risen over 20% this year, outperforming most analysts' expectations [1]. Group 1: US Market Performance - After a significant downturn at the beginning of the year, the US stock market rebounded strongly, with the S&P 500 index showing an annual increase of nearly 16.5%. The release of a large language model by DeepSeek shocked Silicon Valley and led to a drop in tech stocks. Trump's announcement of large tariffs in April triggered sell-offs in stocks, bonds, and the dollar, but strong corporate earnings, expectations of Fed rate cuts, and better-than-expected economic growth quickly brought investors back to the market [2]. - Despite strong performance in the US market, other markets such as China, Japan, the UK, and Germany have outperformed the S&P 500 this year, with emerging market stock indices also performing better than US stocks. Investors sought more diversified allocations after experiencing volatility in the US market at the beginning of the year [4]. Group 2: Economic Resilience and Market Support - The resilience of the US economy, combined with the clear outlook for a shift in Fed monetary policy towards rate cuts, has been a core support for market performance, driving significant capital inflows into the stock market and reinforcing long-term bets on AI potential. Additionally, better-than-expected US economic growth data has alleviated market anxieties and boosted risk appetite [8]. Group 3: Valuation Concerns - Market valuations are significantly above historical averages, with analysts warning that the current rally, driven by tech giants, may not be sustainable. The Shiller cyclically adjusted price-to-earnings ratio for the S&P 500 is nearing 40 times, the second highest level since the early 2000s internet bubble [6][10]. - Following such a strong rally, market sentiment has begun to turn cautious, with some investors and analysts warning about the sustainability of the current market conditions. The rally has shown significant structural concentration and valuation divergence, primarily driven by a few tech giants, leading to a substantial deviation from long-term historical averages [10]. Group 4: Concentration Risk - The current market rally, driven by a small number of stocks, is accumulating structural risks. The so-called "seven giants" of US tech have reached about a quarter of the MSCI global developed market stock index, creating a deep binding of global index movements to the performance of these individual giants, thereby increasing overall market fragility [12]. - The increasing concentration trend in the market is prompting a deep examination of the merger frenzy in the AI sector. This trend has created a complex and interdependent financial network, exemplified by OpenAI, which not only holds stakes in key infrastructure suppliers but also receives substantial investments from other industry participants, potentially amplifying systemic risks [14].
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
华尔街见闻· 2026-01-01 12:20
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) to address the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][6][8]. Group 1: mHC Architecture - The mHC architecture extends the single residual flow of traditional Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][8]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving training instability and excessive memory consumption [8][9]. - Empirical evidence shows that mHC not only addresses stability issues but also demonstrates exceptional scalability in large-scale training, such as with a 27 billion parameter model, where it only increased training time by 6.7% while achieving significant performance improvements [8][32]. Group 2: Challenges with Traditional Hyper-Connections - Traditional hyper-connections (HC) have led to severe training instability and limited scalability due to the fundamental disruption of the inherent identity mapping property, which is crucial for stable training [5][9]. - The widening of information channels in HC results in increased memory access overhead, contributing to what is known as the "memory wall" problem [9][5]. Group 3: Implementation and Efficiency - DeepSeek has designed a tailored infrastructure for mHC, which includes kernel fusion, selective recomputation, and an extended DualPipe communication overlap strategy to minimize memory usage and enhance efficiency [23][25][27]. - The Sinkhorn-Knopp algorithm is employed to ensure that the residual connection matrix remains stable and adheres to the properties of a doubly stochastic matrix, which helps mitigate gradient explosion issues [16][21]. Group 4: Experimental Validation - The research team conducted experiments using language model pre-training to validate the effectiveness of mHC, comparing it against baseline models and traditional HC [28][32]. - Results from various downstream benchmark tests indicate that mHC consistently outperforms baseline models and often surpasses HC, demonstrating its effectiveness in large-scale pre-training [34][33]. - The scalability experiments reveal that mHC maintains performance advantages even at higher computational budgets, showing only slight degradation in performance [36][37].
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
Xin Lang Cai Jing· 2026-01-01 11:45
Core Insights - DeepSeek has introduced an upgraded version of the residual connection, a fundamental component of deep learning proposed by Kaiming He in 2016, marking a significant evolution in the field [1][27]. Group 1: Residual Connections and Hyper-Connections - Residual connections have remained unchanged for a decade, serving as the cornerstone of deep learning architectures, allowing signals to pass directly from shallow to deep layers without modification [5][31]. - The emergence of Hyper-Connections (HC) aims to expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [7][32]. - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange within the residual flow, contributes significantly to performance improvements [7][32]. Group 2: Challenges with Hyper-Connections - When HC is extended to multiple layers, the composite mapping no longer retains the identity property, leading to sudden loss spikes and gradient fluctuations during training [9][34]. - The research team calculated that the amplification factor of the composite mapping in HC peaked at 3000, indicating that signals could be amplified or attenuated drastically during inter-layer propagation [10][35]. Group 3: Double Random Matrix Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double random matrices, known as the Birkhoff polytope [11][36]. - This constraint provides three key theoretical properties: norm preservation, combinatorial closure, and a geometric interpretation that enhances feature fusion stability [14][39][40]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, resulting in a significant reduction in signal gain from 3000 in HC to approximately 1.6 in mHC [16][41]. Group 4: Engineering Optimizations - The expansion of residual flow width incurs additional memory access costs, with detailed analysis showing that standard residual connections require reading 2C elements and writing C elements, while HC requires significantly more [19][44]. - The DeepSeek team developed infrastructure optimizations, including kernel fusion and specialized kernels for the Sinkhorn-Knopp algorithm, to reduce memory access and improve computational efficiency [19][43]. - The paper presents an optimization formula for recomputation strategies, aligning recomputation boundaries with pipeline stage boundaries for enhanced performance [20][45]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4, demonstrating stable training curves and a loss reduction of 0.021 compared to the baseline [22][47]. - In downstream task evaluations, mHC outperformed HC by 2.1% in the BBH reasoning task and 2.3% in the DROP reading comprehension task, showing superior performance across most tasks [22][48]. - Internal large-scale training experiments confirmed these findings, with mHC introducing only a 6.7% additional time overhead when n=4 [25][50].
DeepSeek,最新发布!
Zheng Quan Shi Bao· 2026-01-01 10:56
Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
DeepSeek,最新发布!
证券时报· 2026-01-01 10:53
Core Viewpoint - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3]. Summary by Sections Introduction of mHC - DeepSeek's new paper presents mHC, which projects the hyperconnection's residual connection space onto a specific manifold to restore the identity mapping property and ensure operational efficiency through rigorous infrastructure optimization [3][4]. Performance and Scalability - Empirical results indicate that mHC effectively supports large-scale training, with an additional time overhead of only 6.7% when the expansion rate is set to 4 [4][6]. Research Directions - mHC opens up several important research directions, including compatibility with various manifold constraints tailored for specific learning objectives and potential new methods for balancing plasticity and stability through in-depth studies of differential geometric constraints [7]. Recent Developments - DeepSeek has been active, releasing two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance comparable to GPT-5 in benchmark tests [8]. - The DeepSeek-V3.2-Speciale model combines enhanced reasoning capabilities with mathematical proof abilities, performing well in mainstream reasoning benchmarks [8]. - Additionally, the release of DeepSeek-V3.2-Exp introduces a sparse attention mechanism aimed at improving training and inference efficiency for long texts, with a significant reduction in API costs for developers [9]. Recognition in the Scientific Community - DeepSeek's research paper on the DeepSeek-R1 reasoning model was featured on the cover of the prestigious journal Nature, marking a significant milestone for Chinese AI technology in the international scientific community [9][10].
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
量子位· 2026-01-01 10:32
Core Viewpoint - The article discusses the evolution and enhancement of the residual connection, a fundamental component in deep learning introduced by He Kaiming in ResNet, and presents a new approach called Hyper-Connections (HC) that aims to improve performance while addressing potential issues related to signal amplification and stability in deep learning architectures [2][7][11]. Group 1: Residual Connections and Their Evolution - Residual connections have been a cornerstone of deep learning since the introduction of ResNet in 2016, allowing signals to pass directly from shallow to deep layers without modification [7][9]. - The rise of Transformer architectures has made residual connections a standard feature in large language models like GPT and LLaMA [10]. - Hyper-Connections (HC) expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [11]. Group 2: Performance and Stability Challenges - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange in HC, significantly enhances performance [12]. - However, when HC is extended to multiple layers, the composite mapping loses its identity property, leading to potential issues such as sudden loss spikes and gradient fluctuations during training [14]. - The peak amplification factor of signals in HC can reach 3000, which poses risks of signal distortion during inter-layer propagation [16]. Group 3: Theoretical Framework and Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double stochastic matrices, which ensures three key theoretical properties: norm preservation, combinatorial closure, and geometric interpretation [17][19]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, effectively reducing the signal amplification issue observed in HC [21]. Group 4: Engineering Optimizations - The paper details the memory access costs associated with expanding the residual flow width, highlighting significant increases in read and write operations for HC compared to standard residual connections [24]. - To mitigate these costs, the team developed infrastructure optimizations, including the TileLang framework for merging operations and specialized kernels for the Sinkhorn-Knopp algorithm [25][26]. - The paper also discusses pipeline parallelism enhancements to overlap computation and communication, improving overall efficiency [27]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4 [30]. - In the 27B MoE model, the modified HC (mHC) demonstrated a stable training curve, achieving a loss reduction of 0.021 compared to the baseline while maintaining gradient stability [31]. - Performance improvements were noted in downstream tasks, with mHC outperforming both the baseline and HC in various benchmarks [32][35].
DeepSeek元旦发布新论文 开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 09:28
格隆汇1月1日|DeepSeek在元旦发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。 该研究旨在解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益 。这篇 论文的第一作者有三位:Zhenda Xie(解振达)、Yixuan Wei(韦毅轩)、Huanqi Cao。值得注意的是, DeepSeek创始人&CEO梁文锋也在作者名单中。 ...