DeepSeek
Search documents
清库存,DeepSeek突然补全R1技术报告,训练路径首次详细公开
3 6 Ke· 2026-01-09 03:12
Core Insights - DeepSeek has released an updated version of its research paper on the R1 model, adding 64 pages of technical details, significantly enhancing the original content [4][25] - The new version emphasizes the implementation details of the R1 model, showcasing a systematic approach to its training process [4][6] Summary by Sections Paper Update - The updated paper has expanded from 22 pages to 86 pages, providing a comprehensive view of the R1 model's training and operational details [4][25] - The new version includes a detailed breakdown of the training process, which is divided into four main steps: cold start, inference-oriented reinforcement learning (RL), rejection sampling and fine-tuning, and alignment-oriented RL [6][9] Training Process - The cold start phase utilizes thousands of CoT (Chain of Thought) data to perform supervised fine-tuning (SFT) [6] - The inference-oriented RL phase enhances model capabilities while introducing language consistency rewards to address mixed-language issues [6] - The rejection sampling and fine-tuning phase incorporates both reasoning and general data to improve the model's writing and reasoning abilities [6] - The alignment-oriented RL phase focuses on refining the model's usefulness and safety to align more closely with human preferences [6] Safety Measures - DeepSeek has implemented a risk control system to enhance the safety of the R1 model, which includes a dataset of 106,000 prompts to evaluate model responses based on predefined safety criteria [9][10] - The safety reward model employs a point-wise training method to distinguish between safe and unsafe responses, with training hyperparameters aligned with the usefulness reward model [9] - The risk control system operates through two main processes: potential risk dialogue filtering and model-based risk review [9][10] Performance Metrics - The introduction of the risk control system has led to a significant improvement in the model's safety performance, with R1 achieving benchmark scores comparable to leading models [14] - DeepSeek has developed an internal safety evaluation dataset categorized into four main categories and 28 subcategories, totaling 1,120 questions [19] Team Stability - The core contributors to the DeepSeek team have largely remained intact, with only five out of over 100 authors having left, indicating strong team retention in a competitive AI industry [21][24] - Notably, a previously departed author has returned to the team, highlighting a positive team dynamic compared to other companies in the sector [24]
牌桌被掀,中国模型换了一种赢法
3 6 Ke· 2026-01-08 13:43
Core Insights - The core message of the news is the significant progress and recognition of Chinese AI companies, particularly in the large model sector, highlighted by the IPO of Zhiyu and MiniMax, marking a pivotal moment in the global AI landscape [1][4]. Group 1: IPO Significance - The IPO of Zhiyu and MiniMax serves as an optimistic signal for innovators, indicating that they will not be easily discarded by the times [4]. - The IPO is expected to raise approximately HKD 4.3 billion for Zhiyu, significantly enhancing its market valuation and international influence [27][28]. Group 2: Competitive Landscape - The emergence of DeepSeek has forced several companies within the "Six Little Tigers" to rapidly adjust their business strategies and teams to survive in a highly competitive environment [3][5]. - Despite initial setbacks, the "Six Little Tigers" have shown remarkable resilience and innovation, leading to significant advancements in model performance and market presence [6][8]. Group 3: Market Dynamics - The competitive landscape has shifted, with companies like Zhiyu and MiniMax gaining traction in international markets, evidenced by MiniMax's 73.1% overseas revenue share [14][15]. - The B-end market has matured, with companies realizing the importance of tailored services and industry knowledge, leading to a more robust commercial ecosystem [12][13]. Group 4: Financial Performance - Zhiyu's annual recurring revenue (ARR) surged from RMB 20 million to over RMB 500 million, reflecting a 25-fold increase within ten months [11]. - The financial reports indicate that both Zhiyu and MiniMax have incurred nearly RMB 11 billion in losses over the past three years, primarily due to substantial investments in model research and development [21][24]. Group 5: Long-term Vision - The industry consensus emphasizes the need for sustained innovation and investment, as the AI sector remains in its early stages, with significant long-term potential [23][24]. - IPOs in the AI sector are seen as a reward for long-term commitment and innovation, providing companies with a platform to further their technological advancements [29].
牌桌被掀,中国模型换了一种赢法
36氪· 2026-01-08 13:35
Core Viewpoint - The IPO of AI companies like Zhipu and MiniMax signifies a positive signal for innovation in the AI sector, indicating that innovators will not be easily discarded by the times [10][40][45] Group 1: IPO Significance - Zhipu officially listed on the Hong Kong Stock Exchange on January 8, 2026, becoming the "first stock of global large models" [3] - The IPO is seen as a badge of honor for companies in the AI sector, representing a milestone in their journey [10][45] - The expected fundraising scale for Zhipu is approximately HKD 4.3 billion, which is significantly more efficient than financing through primary markets [43] Group 2: Industry Dynamics - The AI industry has experienced rapid technological changes over the past three years, with companies facing intense scrutiny and competition [4][6] - The emergence of DeepSeek has forced several companies, including the "Six Little Tigers," to quickly adjust their business strategies and teams [6][12] - Despite initial setbacks, the "Six Little Tigers" have shown remarkable resilience and innovation, leading to a resurgence in their market presence [14][19] Group 3: Financial Performance - Zhipu and MiniMax have incurred nearly RMB 11 billion in losses over the past three years, with around 70% of expenditures allocated to model research and development [36] - Zhipu's annual recurring revenue (ARR) from its MaaS platform surged from RMB 20 million to over RMB 500 million, marking a 25-fold increase in just 10 months [19] - The revenue from localized deployments accounted for 84.8% of Zhipu's income in the first half of 2025, highlighting the importance of tailored services for enterprise clients [22] Group 4: Global Recognition - Chinese models are gaining international recognition, with MiniMax reporting that 73.1% of its revenue came from overseas by September 30, 2025 [27] - The competitive pricing of Chinese models, such as Zhipu's GLM-4.5, offers significant cost advantages compared to international counterparts [29][31] - The emergence of independent model developers is crucial for providing diverse model options and establishing a healthy commercial ecosystem [32] Group 5: Long-term Commitment - The AI sector's long-termism emphasizes the need for continuous innovation and investment, with companies like Zhipu and MiniMax embodying this spirit [39] - The IPO serves as a reward for those committed to climbing the AGI peak, reinforcing the notion that the journey of innovation is fraught with challenges but ultimately rewarding [45]
清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开
量子位· 2026-01-08 12:08
Core Insights - DeepSeek has released an updated version of its R1 paper, adding 64 pages of technical details, significantly enhancing the original content [2][5][56] - The new version emphasizes the implementation details and training processes of the R1 model, showcasing a systematic approach to its development [10][11][17] Summary by Sections Paper Updates - The updated paper has expanded from 22 pages to 86 pages, providing a wealth of new information that resembles a textbook [3][6] - The revisions include a comprehensive breakdown of the R1 training process, which is divided into four main steps: cold start, inference-guided reinforcement learning, rejection sampling and fine-tuning, and alignment-guided reinforcement learning [13][14][15][16] Model Performance and Safety - The R1 model has shown a significant increase in reasoning capabilities, with a reported 5 to 7 times increase in the occurrence of reflective vocabulary as training progresses [21][22] - DeepSeek has implemented a safety control system that includes a dataset of 106,000 prompts to evaluate and enhance the model's safety, using a point-wise training method for the safety reward model [26][29] - The introduction of the risk control system has led to a notable improvement in the model's safety performance, with R1 achieving benchmark scores comparable to leading models [32][33] Team Stability and Industry Context - The core team behind the R1 paper has remained stable, with 18 key contributors still part of DeepSeek, indicating a low turnover rate in contrast to industry trends [41][47] - The article contrasts DeepSeek's team retention with the challenges faced by other companies in the AI sector, highlighting a more cohesive internal culture [48][49]
“短缺终将导致过剩”!a16z安德森2026年展望:AI芯片将迎来产能爆发与价格崩塌
硬AI· 2026-01-08 04:24
Core Insights - AI represents a technological revolution larger than the internet, comparable to electricity and microprocessors, and is still in its early stages [2][3][11] - The cost of AI is decreasing at a rate faster than Moore's Law, leading to explosive demand growth [4][41] - Historical patterns suggest that shortages in GPU and data center capacity will eventually lead to oversupply, further driving down AI costs [5][12][41] Group 1: AI Market Dynamics - The future AI market structure will resemble the computer industry, with a few "god-level models" at the top and numerous low-cost "small models" proliferating at the edges [6][19] - The competition between the US and China is intensifying, with Chinese companies like DeepSeek and Kimi making significant strides in open-source strategies and chip development [6][15][59] - AI applications are shifting from "pay-per-token" models to "value-based pricing," allowing startups to integrate and build their own models rather than merely acting as wrappers [7][17] Group 2: Public Perception and Regulatory Landscape - Public sentiment towards AI is mixed, with fears of job displacement coexisting with rapid adoption of AI technologies [8] - The EU's regulatory approach, focusing on leading in regulation rather than innovation, is hindering local AI development [8][60] - The US regulatory environment is shifting towards supporting innovation, with less interest in imposing strict regulations that could hinder competitiveness against China [14][64] Group 3: Economic Implications - The rapid decline in AI input costs is expected to create significant demand elasticity, leading to unprecedented growth in AI applications [41][42] - The economic landscape for AI companies is promising, with many experiencing unprecedented revenue growth as they effectively monetize their offerings [32][39] - The ongoing construction of data centers and GPU production is projected to lead to a significant reduction in AI operational costs over the next decade [41][50]
智谱AI CEO张鹏:DeepSeek对我们影响比较大
Xin Lang Cai Jing· 2026-01-08 03:33
新浪声明:所有会议实录均为现场速记整理,未经演讲者审阅,新浪网登载此文出于传递更多信息之目 的,并不意味着赞同其观点或证实其描述。 责任编辑:李思阳 专题:未竟之约:张小珺访谈录 近日在《未竟之约》栏目中,智谱AI CEO张鹏在与张小珺对话中表示,谈到DeepSeek时表示,"对我们 影响比较大"。 他表示,DeepSeek从研究层面,工程层面,甚至是包括市场层面的冲击都比较大。他表示,春节一回 来就在密集的讨论这件事情,确实有很多的启示和提醒,也学习到很多东西。 专题:未竟之约:张小珺访谈录 近日在《未竟之约》栏目中,智谱AI CEO张鹏在与张小珺对话中表示,谈到DeepSeek时表示,"对我们 影响比较大"。 他表示,DeepSeek从研究层面,工程层面,甚至是包括市场层面的冲击都比较大。他表示,春节一回 来就在密集的讨论这件事情,确实有很多的启示和提醒,也学习到很多东西。 他指出,讨论的结论就是说其实应该更开放式的,打开自己的这个视野,开放式地看待大模型的研究和 市场,很多时候这些因素都搅和在一起,很难把它理得非常的清楚和分割得非常的开,所以还是需要各 方的协同,以更开放的态度来看待这些事情,自己的研究 ...
xAI 200亿美元之后:大模型竞赛开始拼交付
Tai Mei Ti A P P· 2026-01-08 01:43
Core Insights - The article emphasizes a shift in the AI industry from a model-centric competition to a delivery-centric competition, highlighting that while models determine the upper limits of capability, the infrastructure and delivery mechanisms are crucial for scaling and monetizing these capabilities [1][10][13] Group 1: Shift in Focus from Models to Delivery - The transition from model competition to delivery competition is driven by three constraints: rising costs of training and inference, accelerated capability diffusion, and the need for a robust commercial closure [2][8] - The marginal cost of achieving cutting-edge capabilities is increasing, making it essential for leading models to be supported by lower inference costs and stable delivery quality to realize their advantages in scalable scenarios [2][9] Group 2: xAI's $20 Billion Significance - xAI's $20 billion investment is aimed at enhancing its second and third layers of competitive capability, focusing on infrastructure and delivery systems rather than just model development [3][10] - The investment emphasizes the expansion of computational infrastructure and the establishment of a visible asset base with over one million H100 equivalent GPUs, thereby enhancing supply certainty [3][6] Group 3: Competitive Landscape and Capability Layers - The competitive landscape is structured into three layers: model and training methods (first layer), infrastructure and supply chain (second layer), and distribution and entry points (third layer) [3][4] - Major players like Google excel across all three layers, while others like OpenAI and Meta have strengths in specific areas, indicating a need for companies to enhance their infrastructure and delivery capabilities to remain competitive [6][10] Group 4: Future Competition Dynamics - The future competition is expected to resemble a platform war rather than a model elimination race, with a focus on scaling delivery capabilities and ensuring compliance and stability [10][11] - The probability of a single company dominating the global market is low due to the decentralized nature of user preferences and regulatory environments, leading to a scenario where platforms excel in delivery and compliance [11][13] Group 5: Key Indicators for Future Success - Companies should focus on three leading indicators: unit inference cost curves, entry penetration rates, and delivery capabilities to assess competitive positioning in the evolving landscape [9][13] - The ability to convert model capabilities into scalable cash flows will depend on performance in these three areas, marking a significant shift in how success is measured in the AI industry [9][10]
新年首炸!DeepSeek提出mHC架构破解大模型训练难题
Sou Hu Cai Jing· 2026-01-07 09:13
Core Insights - DeepSeek has introduced a new architecture called mHC aimed at addressing stability issues in large-scale model training while maintaining performance improvements [1][11]. Group 1: Problem Identification - Large models face a dilemma in training stability, where traditional single-channel connections lead to information congestion as model size increases [3][5]. - Previous solutions, like the hyper-connection approach, improved efficiency but introduced new issues such as uncontrolled information amplification or suppression, leading to gradient explosion and training failures [5][7][9]. Group 2: mHC Architecture - The mHC architecture incorporates an intelligent scheduling system for multi-channel connections, utilizing the Sinkhorn-Knopp algorithm to maintain energy conservation during information transmission [11][13]. - Additional design features include non-negative constraints on input-output mappings to prevent useful signal loss due to coefficient cancellation [15]. Group 3: Infrastructure Optimization - DeepSeek has optimized its infrastructure by merging multiple computation steps into a single operator, reducing memory read/write cycles and employing recomputation strategies to lower memory usage [16][18]. - These optimizations have resulted in significant stability improvements with minimal increases in training time, even at an expansion factor of 4 [18]. Group 4: Performance Validation - Testing on various model sizes, particularly a 27 billion parameter model, demonstrated that mHC effectively resolved training instability issues, achieving lower loss values compared to traditional baseline models [21][22]. - The performance advantages of mHC were consistent across different model sizes, indicating its practical value for both small and large models [24]. Group 5: Industry Implications - The introduction of mHC suggests a shift in the industry towards refined architectural designs rather than merely increasing parameters and computational power, potentially lowering entry barriers for smaller companies in the large-scale model domain [26][29]. - This pragmatic technological innovation is expected to facilitate the deployment of AI technologies, making it easier for more enterprises to engage in large-scale model development [29].
年度重磅 | 2025影响力女性图鉴:她们发明了自己的战场
Xin Lang Cai Jing· 2026-01-07 08:26
Core Insights - The narrative around women's influence has fundamentally changed over the past year, showcasing women as powerful figures in various fields rather than seeking empowerment from others [1][38]. Part 1: The World Modeler - Fei-Fei Li, founder of World Labs, has focused on "Spatial Intelligence," launching the Marble product, which creates high-fidelity 3D worlds from images, videos, or text prompts [1][2][3]. Part 2: The Pain Translator - Han Kang, a Nobel laureate, sparked global discussions on "female bodily sovereignty" and "historical trauma" with her works, including "The Vegetarian" and "The White Book," which became bestsellers [5][7]. Part 3: The Gold Standard - Caitlin Clark, a WNBA star, doubled viewership and sponsorship fees, proving that female athletes can generate significant commercial value when given equal exposure [11][13]. - Qinwen Zheng, a tennis champion, became a global brand ambassador for Dior and earned $22.6 million in 2025, with 93% from endorsements, redefining the public image of East Asian female athletes [13][17]. Part 4: The Heritage Hacker - Zong Fuli, president of Hongsheng Beverage Group, undertook digital reforms and brand rejuvenation, applying for a new trademark "Wawa Xiaozong" to establish her own identity separate from her father's legacy [14][16][17]. Part 5: The AI Ethicist - Mira Murati, former CTO of OpenAI, founded Thinking Machines Lab with a $12 billion valuation, focusing on creating safer and more reliable AI systems, addressing the gap in public understanding of AI [18][20][21]. Part 6: The Invisible Heroine - Female data annotators in rural China are crucial in training AI models, providing stable income and connecting with modern technology, thus becoming visible contributors to the AI evolution [22][24]. Part 7: The Strategy Sovereign - Meng Wanzhou, rotating chairwoman of Huawei, shifted the company's focus from survival to leadership in AI, achieving significant milestones in various sectors, including the Harmony ecosystem and AI computing [25][27][28]. Part 8: The Grassroots Healer - Dr. Lu Shengmei, a pediatrician, has dedicated her life to serving the community, significantly reducing infant mortality rates and becoming a symbol of enduring value in a rapidly changing world [30][31]. Part 9: The Supply Chain Queen - Wang Laichun, chairwoman of Luxshare Precision, transformed the company from a traditional manufacturer to a technology platform, focusing on high-precision manufacturing and expanding into new markets [32][33][34]. Part 10: The Wilderness Chronicler - Li Juan, author of "My Altay," received the 2025 China Copyright Golden Award, solidifying her status as a literary figure who connects individual souls with nature, providing a counter-narrative to modern anxieties [35][37].
以创新重新释义转型期企业家精神
第一财经· 2026-01-07 02:34
Core Viewpoint - Innovation will be the main theme for the "14th Five-Year Plan" as emphasized by the Chinese Premier Li Qiang during his visit to Guangdong, highlighting the importance of innovation for economic and social development [2]. Group 1: Innovation as a Core Element - Companies are the main entities for innovation, and market application is essential for nurturing technological advancements. Successful companies are those that embrace and practice innovation [2]. - The current economic transition and technological revolution require a redefinition of entrepreneurial spirit, shifting from exploiting scarcity to creating and innovating scarcity [3]. Group 2: Market Orientation and Consumer Respect - Entrepreneurs must focus on the market and respect consumers to drive innovation. Companies lacking this respect will struggle to identify market opportunities and sustain growth [4]. - There is a need for a cognitive transformation among Chinese entrepreneurs to prioritize market orientation and consumer needs, moving away from reliance on past successes and superficial marketing tactics [4]. Group 3: Regulatory and Consumer Rights - Improving legislative quality and enforcement is crucial for protecting consumer rights and ensuring fair market competition. Regulatory bodies must take a firm stance against misleading advertising and practices that infringe on consumer rights [5]. - There is a call to enhance consumer rights through collective litigation and dispute resolution mechanisms, as current low costs for businesses to mislead consumers hinder innovation and economic growth [5]. Group 4: Sustainable Development through Innovation - Companies must understand that quality products and services are achieved through genuine innovation rather than mere marketing gimmicks. Respecting consumers and focusing on market needs is fundamental for sustainable development and innovation [5].