大模型训练
Search documents
新年首炸!DeepSeek提出mHC架构破解大模型训练难题
Sou Hu Cai Jing· 2026-01-07 09:13
新年第一天,DeepSeek悄悄发布了一篇论文。 这篇论文没办发布会,也没搞宣传,却在AI技术圈引发了不小的讨论。 论文提出了一种叫mHC的新架构,核心目标是解决大规模模型训练里的稳定性问题,同时还能保住性 能提升的优势。 圈外人可能听不懂这些术语,但只要搞明白大模型训练的核心痛点,就能理解这篇论文的价值。 大模型就像个复杂的信息处理工厂,残差连接就是工厂里的传送带。 文 |无言 早期的传送带是单通道的,靠着"恒等映射"的设计,能保证信息完整传递,训练起来也稳定。 可随着模型规模越来越大,单通道传送带就不够用了,信息拥堵得厉害。 大模型训练的两难困境 为了解决这个问题,字节跳动的团队之前提出了超连接方案。 这个方案相当于把单通道传送带改成了多通道,信息传输效率确实提高了,性能也跟着提升。 但新的问题很快出现了。 多通道没有统一的调度规则,信息在传输中会出现放大或压制的情况,就像失控的跷跷板。 这种情况直接导致训练过程中梯度爆炸,模型训练到一半就崩溃了。 本来想简单说下这个问题的严重性,但后来发现不举个例子不行。 有头部AI企业试过用类似超连接的方案训练千亿级模型,结果训练进行到一万多步时就频繁中断,损 失值突 ...
科大讯飞:讯飞星火对标A100的训练效率优化后达到85%-95%以上
Xin Lang Cai Jing· 2026-01-06 14:31
科大讯飞在互动平台表示,科大讯飞过去几年在受限的有限算力资源条件下,关于星火大模型训练和推 理成本效率的持续优化做了大量投入,和直接使用英伟达卡上开展的各种工程优化不同,科大讯飞选择 了更难的全国产算力路线。从2023年5月起,科大讯飞就联合华为先后攻克了万卡高速互联组网、计算 通信隐藏、训练推理强交互、高吞吐推理优化以及国产算子优化等一系列难题,将通用大模型、类o1的 深度推理模型等的训练效率对标A100均从最初的30%-50%优化达到了85%-95%以上;2025年以来,科 大讯飞再次攻克国产算力训练的两座大山,一是攻克长思维链强化学习训练效率,深度推理训练效率从 对标A800的30%提升至84%以上,二是攻克MoE模型全链路训练效率,MoE模型的训练效率从今年3月 对标A800的30%提高到93%,实现国产算力平台上在该领域从0到1的重大突破。随着国产算力在底层能 力上进一步提升,讯飞星火的训练成本还有较大的下降空间。 ...
上海,诞生一个超级IPO!
Sou Hu Cai Jing· 2026-01-04 07:06
Core Viewpoint - Shanghai Birun Technology Co., Ltd. has officially listed on the Hong Kong Stock Exchange, becoming the first GPU stock in the Hong Kong market, with a market capitalization exceeding 100 billion HKD at opening [1][3]. Company Overview - Birun Technology was founded in 2019 in Shanghai and focuses on the research and development of General-Purpose Graphics Processing Unit (GPGPU) chips and intelligent computing solutions. It is now among the top tier of domestic GPU companies, alongside Moore Threads, Muxi Technology, and Suiruan Technology, collectively known as the "Four Little Dragons of Domestic GPUs" [3]. - The company raised a total of 5.583 billion HKD through its IPO, netting 5.375 billion HKD, marking the largest fundraising project since the implementation of Chapter 18C of the Hong Kong Stock Exchange listing rules [3]. Leadership and Team - The founder, Zhang Wen, has a unique background with a PhD in law from Harvard and experience as a lawyer and investor. He previously held leadership roles at companies like SenseTime and was involved in the establishment of Birun Technology [3][4]. - The CTO, Hong Zhou, has nearly 30 years of GPU design experience, having worked at major companies like NVIDIA and Huawei. The COO, Zhang Linglan, has over 23 years of experience in the semiconductor industry, previously working at AMD and Samsung [4]. Financial Performance and Funding - Birun Technology has completed 10 rounds of financing, raising over 9 billion RMB, with a valuation reaching 20.9 billion RMB as of August 2025. Notable investors include state-owned funds and top venture capital firms [5][6]. - The company has successfully attracted 23 top-tier investment institutions for its IPO, with a total subscription amount of 2.899 billion HKD [6]. Product Development and Market Position - Birun Technology focuses on cloud-based intelligent computing, with core business in GPGPU chip development. It has launched two chips, BR106 and BR110, with significant sales growth projected [7][8]. - The company has invested over 3.302 billion RMB in R&D from 2022 to mid-2025, with R&D expenses consistently accounting for over 75% of total operating expenses [8]. Revenue Growth and Market Potential - The company's revenue has shown exponential growth, increasing from 499,000 RMB in 2022 to 62.03 million RMB in 2023, and projected to reach 337 million RMB in 2024, reflecting a compound annual growth rate of 2500% [10]. - As of December 15, 2025, Birun Technology has secured sales agreements valued at approximately 1.241 billion RMB, providing solutions to several Fortune China 500 companies [11]. Industry Outlook - The Chinese GPU market is projected to reach 142.5 billion RMB in 2024, with Birun Technology's market share estimated at 0.24%, indicating significant growth potential [12]. - The GPU industry is experiencing unprecedented demand driven by AI applications, with multiple companies preparing for IPOs, highlighting the competitive landscape and the need for sustained funding [12].
DeepSeek发布最新论文,破解大模型训练拥堵难题
Bei Ke Cai Jing· 2026-01-02 12:44
北京时间2026年1月1日,DeepSeek团队在arXiv(预印本)网站和Hugging Face上同步发布了最新论文, 名为《mHC: Manifold-Constrained Hyper-Connections》,论文的核心观点是提出一种名为"mHC"(直译 为"流形约束超连接")的框架,该框架改进了此前大模型训练中一种名为"HC(Hyper-Connections,超 连接)"的范式,对大规模模型训练提供了切实的性能改进。 贝壳财经记者注意到,DeepSeek创始人梁文锋的名字出现在了这篇论文署名作者的最后一位上。事实 上,虽然DeepSeek在2025年春节因为R1模型的开源发布而全球爆火,但在梁文锋的带领下,这家公司 极其低调,团队一直潜心学术,未做过多的商业化尝试,一心扑在基础模型理论研发之上,梁文锋还在 近期入选了《自然》2025年影响科学发展十大人物。 | | @ Models Datasets | | --- | --- | | Hugging Face Q Search models, datasets, use | | 梁文锋的名字出现在论文作者最后一位。Hugging Face网站截 ...
H200春节前重返中国,黄仁勋有多少胜算?
Tai Mei Ti A P P· 2025-12-23 02:35
Core Viewpoint - Nvidia aims to export H200 chips to China before February 17, 2024, with an expected initial shipment of 40,000 to 80,000 units, primarily from inventory capacity [2][3] Group 1: Export Plans and Market Dynamics - Nvidia plans to increase production of H200 chips to supply the Chinese market in Q2 2024 [2] - The export of H200 chips to China is subject to significant uncertainty, as there is currently no approval from Chinese authorities for any related procurement [3] - Following the announcement by Trump allowing Nvidia to export H200 chips to China, the company must pay 25% of sales proceeds to the U.S. government [3][4] Group 2: Regulatory Environment and Challenges - The U.S. government has initiated a review process for the export of H200 chips, which may take up to 30 days, with Trump holding the final decision-making power [4] - There is opposition within the U.S. Congress regarding the export, with calls for more transparency on whether the chips could be used for military purposes [6] - Concerns about "backdoor" security risks have been raised, with previous incidents involving Nvidia's H20 chip [6][9] Group 3: Market Demand and Competition - Major Chinese tech companies like Alibaba, ByteDance, and Tencent are expected to be the first buyers of H200 chips, indicating strong demand in the AI infrastructure sector [7] - Despite the potential for Nvidia's return to the Chinese market, domestic chip manufacturers are rapidly improving their capabilities, posing a competitive threat [9] - AMD and Intel are also targeting the Chinese market, with AMD having already secured export licenses for its AI chips [10][11] Group 4: Financial Implications - The estimated sales revenue from the initial shipment of H200 chips could range from $1 billion to $4 billion, considering the market price and the required tax [8] - Nvidia's previous quarterly revenue from the Chinese market was significantly lower, indicating challenges in regaining market share [8]
Altman直面1.4万亿美元质疑:只要算力还短缺,OpenAI就必须继续烧钱
Hua Er Jie Jian Wen· 2025-12-20 06:10
OpenAI首席执行官Sam Altman强调只要算力还短缺,OpenAI就必须继续烧钱。 12月19日,OpenAI首席执行官Sam Altman在Big Technology Podcast播客访谈中,阐释了公司当前的亏 损源于激进扩张模型训练规模,随着收入增长,以及推理在算力集群中所占比例越来越大,最终它会覆 盖掉训练成本。 面对1.4万亿美元支出承诺与200亿美元收入之间的巨大落差,Altman承认公司目前训练成本增速仍超过 收入增长。但他强调,OpenAI始终处于"算力赤字"状态,这恰恰证明需求强劲。 Altman表示,只有当公司出现大量无法盈利变现的闲置算力时,外界的担忧才合理。目前算力短缺严重 限制了公司的收入增长潜力,这是继续大规模投资的核心理由。 Altman实现盈利的途径取决于一个简单的赌注:OpenAI 能够以与开发速度一样快的速度找到买家。最 终,这个赌注要么持续获胜,要么耗尽所有资源。 训练成本拖累当期利润 数据显示,OpenAI据报道在2028或2029年实现盈利之前,可能会面临约1200亿美元的亏损。对此, Altman确认了公司的策略重心:利用收入增长来支持算力扩张,而非因短期 ...
摩尔线程王华:万卡训练中,最危险的往往是「不报错」丨GAIR 2025
雷峰网· 2025-12-18 00:45
Core Insights - The article discusses the challenges and solutions related to large-scale training practices in AI, particularly focusing on the necessity of massive GPU clusters for training large models [4][6][7]. Group 1: Importance of Large-Scale Training - Large-scale training, specifically with tens of thousands of GPUs, has become a necessary condition for developing large models, as the computational demands have reached unprecedented levels [6][7]. - The computational requirements for mainstream models like DeepSeek and domestic trillion-parameter models are around 10^24 FLOPs, while larger models like Grok4 and GPT-5 may require up to 10^26 FLOPs [7][8][9]. Group 2: Challenges in Large-Scale Training - The transition to large-scale training introduces new challenges such as node failures, performance fluctuations, and communication/storage bottlenecks, which were manageable at smaller scales but become critical at larger scales [4][12]. - Stability and controllability are significant challenges, with issues like silent data errors and system hangs posing risks to training processes [18][20][23]. Group 3: Solutions and Innovations - The company has developed a comprehensive software stack to enhance training efficiency, including a scheduling system, MUSA platform for compatibility, and various training tools optimized for popular frameworks [10][12]. - Innovations such as asynchronous checkpointing and automated pre-training checks have been implemented to minimize downtime and improve overall training efficiency [17][15]. - A monitoring system has been established to detect slow nodes and silent data errors, ensuring that training processes remain stable and efficient [19][20][26]. Group 4: Future Directions - The article emphasizes the importance of continuous improvement and adaptation in training practices, suggesting that the experiences and solutions developed can serve as a reference for other companies and institutions aiming to engage in large-scale training [28].
英唐智控(300131) - 2025年12月11日投资者关系活动记录表
2025-12-11 13:34
Group 1: Company Overview and Business Strategy - The company focuses on electronic component distribution and has built a global multi-regional network covering various categories including main chips, storage, RF, display drivers, power/analog devices, MEMS sensors, and passive components [2][3] - The company is increasing its investment in chip design and manufacturing, aiming to enhance its capabilities and performance in the semiconductor field [3][4] - Recent acquisitions of Guilin Guanglong Integrated and Shanghai Aojian Microelectronics are intended to strengthen the company's layout in optical communication chips and analog integrated circuits [2][4] Group 2: Research and Development - R&D expenses increased by 90.06% year-on-year, driven by investments in self-developed chips and the recruitment of top technical talent [5][6] - The company has successfully introduced its automotive display chip business to several leading screen manufacturers, with the first automotive-grade TDDI/DDIC entering mass production [5][6] - The MEMS micro-mirror product has entered the market, with a focus on automotive LiDAR and laser projection applications [5][9] Group 3: Market Potential and Product Development - The MEMS LBS (Laser Beam Steering) technology is not yet essential for basic vehicle operation but shows potential for enhancing user experience in high-end models [10] - The global annual production of new cars is approximately 90 million, indicating a broad market potential for MEMS LBS products [10] - The number of MEMS micro-mirrors is critical for determining the number of OCS (Optical Circuit Switching) channels, with higher channel counts requiring more mirrors [11] Group 4: Risks and Regulatory Considerations - The acquisition process involves regulatory approvals from the Shenzhen Stock Exchange and the China Securities Regulatory Commission, which may impact the transaction timeline [18] - There are risks associated with the transaction being suspended, interrupted, or canceled, necessitating careful investor decision-making [18]
英唐智控(300131) - 2025年12月4日投资者关系活动记录表
2025-12-04 13:48
Group 1: Company Overview - Shenzhen Yingtang Intelligent Control Co., Ltd. focuses on electronic component distribution, covering a wide range of products including main chips, storage, RF, display drivers, power/analog devices, MEMS sensors, and passive components [2]. - The company has developed self-researched chips, particularly in MEMS micro-mirrors and automotive display chips, with significant R&D investment and talent acquisition [2]. - Yingtang has successfully introduced automotive display chip business to several leading screen manufacturers, with the first automotive-grade TDDI/DDIC entering mass production [2]. Group 2: Strategic Acquisitions - Yingtang is preparing to acquire Guilin Guanglong Integrated Technology and Shanghai Aojian Microelectronics to enhance its capabilities in optical communication chips and analog integrated circuits [2]. - The acquisitions aim to create synergies with existing distribution and self-research businesses, leveraging the growth of generative AI, large model training, and cloud computing [2]. Group 3: Technology and Production Capabilities - Guilin Guanglong Integrated focuses on optical switch technology, with expertise in various control methods including mechanical, MEMS, magneto-optical, electro-optical, and waveguide types [3]. - The company has achieved high-precision automated assembly and testing for optical switch systems, enabling mass production and cost control [3]. Group 4: Market Applications and Demand - Guanglong Integrated's products serve various applications, including collaboration between computing power and networks, intelligent management for telecom operators, and testing for optical modules [4]. - The demand for optical switch technology is expected to grow due to the construction and upgrade of high-speed network infrastructure, such as 5G and data centers [6]. Group 5: Risks and Regulatory Considerations - The acquisition process involves regulatory approvals from the Shenzhen Stock Exchange and the China Securities Regulatory Commission, which may impact the transaction timeline [6]. - There are risks associated with the transaction being suspended, interrupted, or canceled, and the company will fulfill its information disclosure obligations accordingly [6].
英唐智控(300131) - 300131英唐智控投资者关系管理信息20251201
2025-12-01 13:38
Company Overview - Shenzhen Yingtang Intelligent Control Co., Ltd. focuses on electronic component distribution and has built a global multi-regional network covering various product categories including main chips, storage, RF, display drivers, power/analog devices, MEMS sensors, and passive components [2]. - The company has successfully introduced its self-developed automotive display chips into several leading screen manufacturers, with the first automotive-grade TDDI/DDIC entering mass production [2][3]. Financial Performance - R&D expenses increased by 90.06% year-on-year in the first three quarters, primarily due to investments in display chip development [3]. - The self-developed MEMS micro-mirror products have entered the market, with a 4mm specification now available [3]. Market Position and Competitive Advantage - The company holds a local advantage in the automotive display chip market, which is predominantly occupied by Taiwanese and Korean manufacturers [4]. - The automotive display chip segment has achieved mass production, with improved versions in the trial production phase [4]. Strategic Initiatives - The company is preparing to acquire Guilin Guanglong Integration and Shanghai Aojian Microelectronics to strengthen its position in optical communication chips and analog integrated circuits [2][6]. - The acquisition aims to create synergies with existing distribution and self-developed businesses, leveraging advancements in generative AI and cloud computing [2]. Product Development and Innovation - The company is actively developing a local supply chain to enhance its product competitiveness and increase market share [4]. - The OCS (Optical Circuit Switching) technology is primarily based on MEMS solutions, which dominate over 50% of the market, offering rapid switching speeds and low signal transmission losses [9]. Risks and Challenges - The ongoing asset acquisition is subject to regulatory approvals, which may impact the transaction timeline [8]. - There are risks associated with the potential suspension or cancellation of the transaction, necessitating careful investor decision-making [8].