Seek .(SKLTY)
Search documents
DeepSeek 开年发布新论文:提出全新 mHC 架构,梁文锋现身作者名单
Xin Lang Cai Jing· 2026-01-01 12:24
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyperconnection) aimed at addressing the instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][6] Group 1: Research and Development - The paper presents mHC as a universal framework that projects the residual connection space of hyperconnections onto a specific manifold to restore the identity mapping property [6] - The authors of the paper include Zhenda Xie, Yixuan Wei, Huanqi Cao, and Liang Wenfeng, the founder and CEO of DeepSeek [1] Group 2: Performance and Scalability - Empirical experiments indicate that mHC is effective for large-scale training, providing tangible performance improvements and excellent scalability [6] - The proposed architecture is expected to contribute to a deeper understanding of topological architecture design and offer promising directions for the evolution of foundational models [6]
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
Xin Lang Cai Jing· 2026-01-01 11:45
Core Insights - DeepSeek has introduced an upgraded version of the residual connection, a fundamental component of deep learning proposed by Kaiming He in 2016, marking a significant evolution in the field [1][27]. Group 1: Residual Connections and Hyper-Connections - Residual connections have remained unchanged for a decade, serving as the cornerstone of deep learning architectures, allowing signals to pass directly from shallow to deep layers without modification [5][31]. - The emergence of Hyper-Connections (HC) aims to expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [7][32]. - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange within the residual flow, contributes significantly to performance improvements [7][32]. Group 2: Challenges with Hyper-Connections - When HC is extended to multiple layers, the composite mapping no longer retains the identity property, leading to sudden loss spikes and gradient fluctuations during training [9][34]. - The research team calculated that the amplification factor of the composite mapping in HC peaked at 3000, indicating that signals could be amplified or attenuated drastically during inter-layer propagation [10][35]. Group 3: Double Random Matrix Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double random matrices, known as the Birkhoff polytope [11][36]. - This constraint provides three key theoretical properties: norm preservation, combinatorial closure, and a geometric interpretation that enhances feature fusion stability [14][39][40]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, resulting in a significant reduction in signal gain from 3000 in HC to approximately 1.6 in mHC [16][41]. Group 4: Engineering Optimizations - The expansion of residual flow width incurs additional memory access costs, with detailed analysis showing that standard residual connections require reading 2C elements and writing C elements, while HC requires significantly more [19][44]. - The DeepSeek team developed infrastructure optimizations, including kernel fusion and specialized kernels for the Sinkhorn-Knopp algorithm, to reduce memory access and improve computational efficiency [19][43]. - The paper presents an optimization formula for recomputation strategies, aligning recomputation boundaries with pipeline stage boundaries for enhanced performance [20][45]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4, demonstrating stable training curves and a loss reduction of 0.021 compared to the baseline [22][47]. - In downstream task evaluations, mHC outperformed HC by 2.1% in the BBH reasoning task and 2.3% in the DROP reading comprehension task, showing superior performance across most tasks [22][48]. - Internal large-scale training experiments confirmed these findings, with mHC introducing only a 6.7% additional time overhead when n=4 [25][50].
DeepSeek,最新发布!
Zheng Quan Shi Bao· 2026-01-01 10:56
Group 1 - DeepSeek has introduced a new architecture called mHC (manifold-constrained hyperconnection) to address instability issues in traditional hyperconnections during large-scale model training while maintaining significant performance gains [1][3] - The research highlights that while hyperconnections have improved performance by diversifying connection patterns, they have also weakened the inherent identity mapping property of residual connections, leading to training instability and limited scalability [3] - Empirical results indicate that mHC effectively supports large-scale training with only a 6.7% additional time overhead when the expansion rate is set to 4, demonstrating its efficiency [3][5] Group 2 - DeepSeek recently launched two official model versions, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with V3.2 achieving performance comparable to GPT-5 in inference benchmarks, suitable for everyday tasks [6][7] - The V3.2-Speciale model enhances long reasoning capabilities and combines theorem proving abilities, performing similarly to Gemini-3.0-Pro in mainstream inference benchmarks [7] - DeepSeek has also reduced API costs by over 50%, making it more accessible for developers [7] Group 3 - DeepSeek's research paper on the R1 inference model was featured on the cover of the prestigious journal Nature, marking a significant achievement for Chinese AI technology in the international scientific community [8] - This publication is notable as it is the first mainstream large language model research to undergo complete peer review and be published in a leading journal, breaking a gap in the field [8]
刚刚,梁文锋署名,DeepSeek元旦新论文要开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 10:34
Core Insights - DeepSeek has introduced a new architecture called Manifold-Constrained Hyper-Connections (mHC) aimed at addressing the instability issues in traditional hyper-connections during large-scale model training while maintaining significant performance gains [1][27][28]. Group 1: Architecture and Methodology - The mHC architecture expands the traditional single residual flow of Transformers into a multi-flow parallel structure, utilizing the Sinkhorn-Knopp algorithm to constrain the connection matrix on a doubly stochastic matrix manifold [1][28]. - The core objective of mHC is to retain the performance improvements from widening the residual flow while resolving issues related to training instability and excessive memory consumption [4][34]. - The research team has implemented infrastructure optimizations such as kernel fusion, selective recomputation, and an extended DualPipe communication strategy to offset the overhead caused by wider channels [31][34]. Group 2: Performance and Stability - Empirical evidence shows that mHC not only resolves stability issues but also demonstrates exceptional scalability in large-scale training scenarios, such as with a 27 billion parameter model, where it only increased training time overhead by 6.7% while achieving significant performance improvements [34][49]. - The training stability of mHC was evaluated against a baseline model, showing a reduction in final loss by 0.021 and maintaining a stable gradient norm profile, indicating superior stability compared to traditional hyper-connections [49][50]. Group 3: Benchmarking and Results - In various downstream benchmark tests, mHC consistently outperformed the baseline model and surpassed traditional hyper-connections in most tasks, achieving performance gains of 2.1% and 2.3% in specific tasks [51][52]. - The scalability experiments indicated that mHC maintains its performance advantages even under higher computational budgets, demonstrating robust effectiveness in large-scale scenarios [52][53].
DeepSeek元旦发布新论文 开启架构新篇章
Xin Lang Cai Jing· 2026-01-01 09:28
格隆汇1月1日|DeepSeek在元旦发布了一篇新论文,提出了一种名为mHC(流形约束超连接)的新架构。 该研究旨在解决传统超连接在大规模模型训练中的不稳定性问题,同时保持其显著的性能增益 。这篇 论文的第一作者有三位:Zhenda Xie(解振达)、Yixuan Wei(韦毅轩)、Huanqi Cao。值得注意的是, DeepSeek创始人&CEO梁文锋也在作者名单中。 ...
2025盘点:DeepSeek引领AI进化 国补激发消费活力 行业重塑带来更多可能
Xin Lang Cai Jing· 2025-12-31 16:07
Core Insights - The year 2025 has been pivotal for the digital 3C industry, marked by significant advancements in AI technology, policy support, and market dynamics, setting the stage for future developments in 2026 [1][15] Group 1: AI Developments - The launch of DeepSeek-R1 on January 20, 2025, showcased its competitive capabilities against top closed-source models with a training cost of approximately $6 million, challenging Silicon Valley's computational dominance [1][16] - DeepSeek's V3.2-Exp, released in September, introduced a sparse attention mechanism that halved API prices, while the December V3.2 version integrated logical reasoning with agent tool usage, achieving gold medal performances in international competitions [2][16] - DeepSeek's contributions to the 3C industry include promoting "open-source equity," enabling low-cost smart experiences on budget devices through cloud APIs, and leading a global shift towards efficiency in AI [2][16] Group 2: Policy Impact on Market - 2025 is defined as the "Year of National Subsidies" for the 3C market, with the introduction of a policy on January 8 that included subsidies of up to 500 yuan for mobile phones, tablets, and smartwatches, significantly boosting daily active users on e-commerce platforms [3][18] - The subsidy policy expanded in the second half of the year, with 14 provinces increasing the maximum subsidy to 700 yuan, resulting in a total retail sales increase of over 120 billion yuan [3][18] - The continuation of the subsidy policy into 2026 is expected to further include emerging categories like smart glasses, enhancing consumer access to mid-to-high-end products and shifting competition from parameter-based pricing to value-for-money battles [5][18] Group 3: Industry Challenges - The "Romashi incident" in June 2025 involved the recall of nearly 500,000 defective power banks due to safety concerns, leading to significant regulatory responses and the introduction of stricter safety standards in the power bank industry [19][21] - Following the incident, new regulations mandated that all power banks must carry a 3C certification, marking a shift away from low-cost models and ensuring consumer safety [21][22] Group 4: Growth of AI Glasses - 2025 marked a breakthrough year for the AI glasses industry, driven by policy support and market demand, with global shipments expected to reach 12.05 million units and the Chinese market alone surpassing 2.75 million units, reflecting a 107% year-on-year increase [8][22] - The emergence of numerous brands, including major players like Huawei and Xiaomi, indicates a competitive landscape with nearly 70 companies entering the market [10][24] Group 5: AI Assistant Developments - The launch of the "Doubao Phone" by ByteDance and ZTE on December 1, 2025, introduced an AI assistant capable of executing complex tasks across applications, marking a significant advancement in mobile technology [10][24] - The introduction of the AI assistant sparked a debate over app permissions and user data security, highlighting the tension between innovation and established app ecosystems [12][27]
科学圆桌会·趣谈2025| 药理学家:这一年,国产创新药正在经历“DeepSeek时刻”
Xin Hua She· 2025-12-31 05:04
今年,我们团队经过多年的努力,提出了靶向肾脏纤维化的嵌合抗原受体T细胞免疫疗法(CAR-T)新 思路,引起了业界的高度关注。但我深知,这仅仅是中国药物研发与细胞治疗领域快速发展大潮中的一 朵小小的浪花。 有一天,我们团队与国内生物医药公司讨论完这一新疗法的临床研究方案后,已是午夜时分。走出实验 室,一直紧绷的神经放松下来,我才注意到冬夜的校园那么美,多年前栽下的蜡梅已含苞待放。这何尝 不是创新药从零起步、艰难"绽放"的写照? 身为医药人,站在2025年岁末,有一种格外强烈的感慨:从被业界誉为"中国创新药元年"的2015年算 起,十年磨一剑,国产创新药正在经历"DeepSeek时刻":以长期积累的创新努力迎来产品重大突破。 在医药界,创新药有两个定律。 一个是"双十定律":十年时间、十亿美元,才能让一个新药从实验室 走向患者。这道"高墙",曾让无数创新梦想折戟;另一个就是"九死一生定律":约90%的创新药项目在 临床前或临床阶段失败,最终仅少数获批上市。 就拿我研究的领域来说,慢性肾脏病(CKD)正成为全球公共卫生面临的新挑战。今年5月召开的第78 届世界卫生大会(WHA),将肾脏疾病列入全球优先关注的重大非传 ...
药理学家:这一年,国产创新药正在经历“DeepSeek时刻”
Xin Hua She· 2025-12-31 05:02
身为医药人,站在2025年岁末,有一种格外强烈的感慨:从被业界誉为"中国创新药元年"的2015年算 起,十年磨一剑,国产创新药正在经历"DeepSeek时刻":以长期积累的创新努力迎来产品重大突破。 今年,我们团队经过多年的努力,提出了靶向肾脏纤维化的嵌合抗原受体T细胞免疫疗法(CAR-T)新 思路,引起了业界的高度关注。但我深知,这仅仅是中国药物研发与细胞治疗领域快速发展大潮中的一 朵小小的浪花。 有一天,我们团队与国内生物医药公司讨论完这一新疗法的临床研究方案后,已是午夜时分。走出实验 室,一直紧绷的神经放松下来,我才注意到冬夜的校园那么美,多年前栽下的蜡梅已含苞待放。这何尝 不是创新药从零起步、艰难"绽放"的写照? 在医药界,创新药有两个定律。 一个是"双十定律":十年时间、十亿美元,才能让一个新药从实验室 走向患者。这道"高墙",曾让无数创新梦想折戟;另一个就是"九死一生定律":约90%的创新药项目在 临床前或临床阶段失败,最终仅少数获批上市。 就拿我研究的领域来说,慢性肾脏病(CKD)正成为全球公共卫生面临的新挑战。今年5月召开的第78 届世界卫生大会(WHA),将肾脏疾病列入全球优先关注的重大非传 ...
PriceSeek提醒:五矿铜矿扩建供应增
Xin Lang Cai Jing· 2025-12-30 11:09
Core Viewpoint - China Minmetals Resources (MMG) announced an investment of approximately $900 million to expand the Khoemacau copper mine in Botswana, which is expected to increase annual copper concentrate production to 130,000 tons and add over 4 million ounces of silver production, enhancing long-term profitability to meet the demand from the electric vehicle and semiconductor industries [1][4]. Group 1: Copper Production - The expansion is projected to increase copper concentrate production to 130,000 tons annually, with potential to reach 200,000 tons in the future [1][4]. - Production costs are expected to decrease to below $1.60 per pound, which may stimulate further production expansion [1][5]. - The increase in supply is likely to exert downward pressure on copper spot prices, with short-term oversupply risks intensifying despite some support from growing demand in electric vehicles and semiconductors [5]. Group 2: Silver Production - The expansion project will add over 4 million ounces of silver production annually, which is a byproduct of copper mining [2][5]. - The increase in silver supply is anticipated to lead to a loosening of supply and demand in the silver spot market, putting downward pressure on prices [2][5]. - There are no significant demand-side factors to offset this supply increase, suggesting that the bearish effects on silver prices may persist in the short term, although the impact may be less severe than that on copper [2][5].
年终盘点|DeepSeek点燃AI热......一文看懂2025年A股热炒题材
Xin Lang Cai Jing· 2025-12-29 13:45
Core Viewpoint - The A-share market in 2025 experienced a significant upward trend after a rapid decline in early April, culminating in a clear focus on "new productive forces" driven by policy, events, and industry developments, leading to a fast-paced and concentrated trading environment [1][23]. AI Hardware and Chip Sector - The launch of the DeepSeek-R1 model in January 2025, with a training cost of approximately $294,000, disrupted the belief that top models required tens of millions of dollars, leading to a surge in domestic AI hardware stocks [3][25]. - The performance of computing chip stocks was notable, with Tianpu Co. achieving a maximum annual increase of over 1300%, while Dongxin Co. and Xinyuan Co. both exceeded 300% [3][27]. Storage Chip Sector - The demand for storage chips surged, with prices for DRAM and NAND Flash increasing by over 300% since September 2025, driven by major companies focusing on HBM and DDR5 [8][29]. - The top performers in the storage chip sector included Xiangnan Chip, which saw an annual increase of over 600%, and several others exceeding 200% [8][31]. Precious Metals and Commodities - The precious metals market experienced a historic bull run, with gold prices rising over 70% and silver over 170% due to global liquidity and demand from emerging industries [10][31]. - The industrial metal sector also thrived, with copper prices increasing over 40%, and several companies in the sector achieving annual increases exceeding 200% [10][31]. Commercial Aerospace Sector - The commercial aerospace sector saw significant growth in Q4 2025, with multiple successful rocket launches and a notable IPO plan from SpaceX, valued at approximately $1.5 trillion [12][34]. - Key stocks in this sector, such as Shunhao Co. and Feiwo Technology, recorded annual increases exceeding 450% [12][34]. Energy Storage and Lithium Battery Sector - The global energy storage demand is projected to grow significantly, with a target of 180 million kilowatts of new storage capacity by the end of 2027, leading to a resurgence in the lithium battery industry [15][36]. - Major players in the lithium battery sector, such as Ningde Times, saw their market value exceed 1.8 trillion yuan, with several companies achieving annual increases over 560% [15][36]. Regional Policy Impact - The Fujian and Hainan regions experienced significant market activity due to new policies, with Fujian's stock performance showing increases over 500% for some companies [19][40]. - Hainan's free trade port officially launched, leading to strong growth in related stocks, with some companies achieving annual increases over 180% [19][42].