Workflow
Seek .(SKLTY)
icon
Search documents
国际期刊发表DeepSeek大规模推理模型训练方法 揭示AI背后的科学
Zhong Guo Xin Wen Wang· 2025-09-18 02:55
Core Insights - DeepSeek, a Chinese company focused on large language models (LLM) and artificial general intelligence (AGI), has gained attention for its open-source AI model DeepSeek-R1, which employs a large-scale inference model training method [1] - The training method was published in the prestigious journal Nature, revealing that the reasoning capabilities of LLMs can be enhanced through pure reinforcement learning, thereby reducing the human input required for performance enhancement [1] - The model outperformed traditional LLMs in tasks related to mathematics, programming competitions, and graduate-level STEM problems [1] Group 1 - DeepSeek-R1 includes a supervised in-depth training phase to optimize the reasoning process, utilizing reinforcement learning instead of human examples to develop reasoning steps, which reduces training costs and complexity [2] - The model achieved scores of 77.9% and 79.8% in mathematical benchmark tests for DeepSeek-R1-Zero and DeepSeek-R1, respectively, and also excelled in programming competitions and graduate-level biology, physics, and chemistry problems [2] - A concurrent article in Nature highlighted some limitations of the current version of DeepSeek-R1, such as language mixing and sensitivity to prompt engineering, indicating areas for improvement in future versions [2] Group 2 - The DeepSeek-AI team concluded that future research should focus on optimizing the reward process to ensure reliable reasoning and task outcomes [3]
DeepSeek论文登上《自然》封面,R1成为首个严格学术审查大模型
Xin Lang Cai Jing· 2025-09-18 02:23
Core Insights - DeepSeek's R1 model has been recognized as the first major language model to be peer-reviewed and published in the prestigious journal Nature, marking a significant milestone in AI research [1][2] - The R1 model achieved over 10.9 million downloads on Hugging Face, making it the most popular open-source inference model globally [2] - DeepSeek's innovative approach utilizes pure reinforcement learning to enhance reasoning capabilities, diverging from traditional human-imitation methods [2][3] Company Developments - DeepSeek's R1 model was developed with a training cost of only $294,000, significantly lower than the costs associated with training AI models by OpenAI and Google, which can reach millions [2] - The company released an upgraded version, DeepSeek-V3.1, which features a mixed reasoning architecture, improved thinking efficiency, and enhanced agent capabilities [3] - DeepSeek was founded in 2023 in Hangzhou, backed by the quantitative firm Huansquare, with a team composed of experts from top universities and international institutions [3] Industry Context - The publication of DeepSeek's research is seen as a critical step in addressing the rampant speculation and unverified claims within the AI industry, emphasizing the importance of independent peer review [3] - The recognition of DeepSeek's work by Nature highlights China's advancements in foundational research in large models, contributing to the global AI landscape [2]
DeepSeek-R1论文登上《自然》封面,AI人工智能ETF(512930)涨超0.6%冲击3连涨
Xin Lang Cai Jing· 2025-09-18 02:04
Group 1 - DeepSeek-R1 reasoning model research paper, led by Liang Wenfeng, has been published in the prestigious journal Nature, marking it as the first mainstream large language model to undergo peer review [1] - The latest paper provides more details on model training and addresses initial concerns regarding model distillation, highlighting the significance of independent peer review in the AI field [1] - The AI industry is experiencing a positive cycle driven by performance and capital expenditure, with the domestic AI ecosystem rapidly developing across various segments including large models, computing power, and applications [1] Group 2 - As of September 18, 2025, the CSI Artificial Intelligence Theme Index (930713) rose by 0.65%, with notable gains from stocks such as Jingsheng Electronics (up 9.99%) and Rockchip (up 5.82%) [2] - The AI Artificial Intelligence ETF (512930) also increased by 0.66%, achieving a three-day consecutive rise, with a latest price of 2.13 yuan and a weekly increase of 8.08% [2] - The management fee for the AI Artificial Intelligence ETF is 0.15%, and the custody fee is 0.05%, making it the lowest among comparable funds, while it has the highest tracking accuracy of 0.008% over the past three months [2] Group 3 - As of August 29, 2025, the top ten weighted stocks in the CSI Artificial Intelligence Theme Index accounted for 60.82% of the index, with companies like Xinyi Technology and Zhongji Xuchuang leading the list [3] - The top ten stocks include Xinyi Technology (300502), Zhongji Xuchuang (300308), and Cambricon (688256), among others, indicating a concentration of investment in these key players within the AI sector [3][5]
DeepSeek-R1登上Nature封面:朝着AI透明化迈出的可喜一步
3 6 Ke· 2025-09-18 02:02
Core Insights - The value of open-source artificial intelligence (AI) is gaining broader recognition, highlighted by the publication of the DeepSeek-R1 paper in the prestigious journal Nature, with founder Liang Wenfeng as the corresponding author [1][5]. Research Findings - The research team hypothesized that human-defined reasoning patterns might limit model exploration, and unrestricted reinforcement learning (RL) training could better stimulate the emergence of new reasoning capabilities in large language models (LLMs) [3][8]. - Experiments demonstrated that the reasoning ability of LLMs can be enhanced through pure RL, reducing the need for human input, and outperforming traditionally trained LLMs in tasks such as mathematics, programming competitions, and graduate-level STEM problems [3][9]. Model Evaluation - Following the launch of DeepSeek-R1, it received widespread acclaim from global developers, achieving 91.1k stars on GitHub [4]. - Nature's editorial recognized DeepSeek-R1 as the first mainstream LLM published after peer review, marking a significant step towards transparency in AI [5][17]. - The editorial emphasized the importance of peer-reviewed publications in clarifying LLM operations and assessing their authenticity [6][17]. Methodology - The research introduced a new paradigm within the RL framework, minimizing reliance on human-annotated reasoning processes and exploring the potential for LLMs to develop reasoning capabilities through self-evolution [9][10]. - The team proposed a RL algorithm called "Group Relative Policy Optimization" (GRPO) and trained various models, including DeepSeek-R1-Zero and DeepSeek-R1, based on the foundational model DeepSeek-V3 Base [10][12]. Training Phases - The training process involved multiple stages, with each subsequent model improving upon the previous one in terms of reasoning and instruction-following capabilities [14]. - DeepSeek-R1 demonstrated strong reasoning abilities aligned with human preferences, achieving superior performance across 21 mainstream benchmarks, validating the effectiveness of the RL framework [15][16]. Industry Implications - The editorial raised concerns about the lack of independent peer review for many widely used LLMs, highlighting the need for transparency and accountability in the AI industry [17][18]. - Nature called for more AI companies to submit their models for publication review, emphasizing that peer review can enhance trust and credibility in AI research [18][19].
DeepSeek登上Nature封面,梁文锋带队回应质疑,R1训练真29.4万美金
3 6 Ke· 2025-09-18 01:32
Core Insights - The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" has gained significant recognition, being featured on the cover of a leading global journal, Nature [2][4] - DeepSeek-R1 is noted as the first mainstream large language model (LLM) to undergo a peer review process, which has set a precedent for transparency in AI development [7] Model Performance and Popularity - After its open-source release, DeepSeek-R1 became the most downloaded model on Hugging Face, surpassing 10.9 million downloads [4] - The model demonstrated a remarkable improvement in reasoning capabilities, achieving an average problem-solving accuracy (pass@1) of 77.9%, and up to 86.7% with "self-consistent decoding" technology [10] Training Costs and Efficiency - The training cost for DeepSeek-R1 was reported at $294,000, significantly lower than the costs incurred by companies like OpenAI and Google [5][6] - The training process involved 147,000 GPU hours, with a breakdown of costs for different training phases [6] Innovative Training Approach - DeepSeek-R1-Zero was developed by completely discarding human reasoning patterns, utilizing a simplified reinforcement learning framework [8][10] - The model was trained with a focus on two main components: task format and reward signals based on the correctness of final answers [10] Self-Evolution and Advanced Reasoning - During training, the model exhibited self-evolution behaviors, increasing the length of generated text in the "think" tag and developing advanced reasoning strategies [12][15] - A notable "Aha Moment" was observed when the model began using the word "wait" more frequently, indicating a shift in its reasoning process [16][18] Multi-Stage Training Process - The training process consists of multiple stages, including cold start, reinforcement learning, large-scale supervised fine-tuning, and a second round of reinforcement learning [19][20] - Each stage is designed to enhance different aspects of the model's capabilities, from initial fine-tuning to improving language consistency and general knowledge [20][35] Reward System Design - DeepSeek implemented a dual-track reward system, combining rule-based rewards for reasoning tasks and model-based rewards for general tasks [27][30] - The rule-based rewards focus on accuracy and format compliance, while the model-based rewards assess the usefulness and safety of the outputs [28][31] Challenges and Future Directions - Despite its advanced reasoning capabilities, DeepSeek-R1 faces limitations in structured outputs and tool usage, and it is sensitive to prompt variations [43] - The reliance on reliable reward signals poses challenges, particularly for subjective tasks, which may lead to reward hacking [44]
DeepSeek-R1论文登上Nature封面,通讯作者梁文锋
3 6 Ke· 2025-09-18 00:45
太令人意外! 却又实至名归! 最新一期的 Nature 封面,竟然是 DeepSeek-R1 的研究。 也就是今年 1 月份 DeepSeek 在 arxiv 公布的论文《DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning》。这篇Nature论文 通讯作者正是梁文锋。 论文链接: https://www.nature.com/articles/s41586-025-09422-z 在封面的推荐介绍中,Nature 写到: 如果训练出的大模型能够规划解决问题所需的步骤,那么它们往往能够更好地解决问题。这种『推理』与人类处理更复杂问题的方式类似,但 这对人工智能有极大挑战,需要人工干预来添加标签和注释。在本周的期刊中,DeepSeek 的研究人员揭示了他们如何能够在极少的人工输入 下训练一个模型,并使其进行推理。 DeepSeek-R1 模型采用强化学习进行训练。在这种学习中,模型正确解答数学问题时会获得高分奖励,答错则会受到惩罚。结果,它学会了推 理——逐步解决问题并揭示这些步骤——更有可能得出正确 ...
美联储宣布降息25个基点;欧盟宣布对以色列实施制裁,以外长回应;DeepSeek-R1开创历史,梁文锋论文登上《自然》封面|早报
Di Yi Cai Jing· 2025-09-18 00:20
Group 1 - The Federal Reserve announced a 25 basis point interest rate cut, bringing the target range for the federal funds rate to 4.00%-4.25%, marking the first rate cut since December 2024 [2] - The European Union has implemented sanctions against Israel, including the partial suspension of trade-related aspects of the EU-Israel Association Agreement, in response to the situation in Gaza [3] - The Ministry of Commerce plans to conduct pilot projects for new consumption formats and models in approximately 50 cities to enhance quality consumption supply and stimulate economic growth [6] Group 2 - In August, China's fiscal revenue reached 148.198 billion yuan, with a year-on-year growth of 0.3%, marking the first positive growth in tax revenue this year [5] - The China Association of Automobile Manufacturers reported that domestic sales of new energy vehicles reached 1.171 million units in August, a year-on-year increase of 18.3%, with total sales for the first eight months reaching 8.088 million units, up 30.1% [10] - Vanke has undergone its largest organizational restructuring in recent years, adjusting its management structure to include 16 regional companies directly managed by headquarters [22] Group 3 - Dongfeng Group announced the establishment of a new joint venture company with a registered capital of 8.47 billion yuan, focusing on the Hummer brand and incorporating various intangible assets [23] - The Ministry of Industry and Information Technology is soliciting public opinions on mandatory national standards for safety requirements of intelligent connected vehicles [9] - The World Trade Organization predicts that AI could increase cross-border goods and services flow by nearly 40% by 2040, provided that policies are implemented to bridge the digital divide [21]
8点1氪|西贝回应“公筷喂狗”事件;美联储宣布降息25个基点;DeepSeek梁文锋论文登上《自然》封面
3 6 Ke· 2025-09-18 00:06
Group 1 - The incident involving a customer feeding a dog with restaurant chopsticks at a West B restaurant raised concerns about dining safety, leading to the disposal of all involved utensils and a thorough sanitation of the premises [1] - The Federal Reserve announced a 25 basis point cut in the federal funds rate, marking its first rate decrease since December 2024, bringing the target range to 4.00%-4.25% [1] Group 2 - DeepSeek's research paper on the DeepSeek-R1 reasoning model was featured on the cover of the prestigious journal Nature, highlighting its significance as the first mainstream large language model to undergo peer review [2][3] - The U.S. government extended the grace period for TikTok's ban for the fourth time, now set to expire on December 16, 2024 [4] - A Chinese restaurant chain, Green Tea, was reported to have removed its "no pre-made dishes" signage, raising questions about its food preparation practices [6] Group 3 - The Chinese government announced that personal medical insurance accounts can now transfer funds to family members' accounts for medical expenses, enhancing the utility of these accounts [5] - A man discovered two salary payments linked to his identity, suggesting potential misuse of personal information by a company to evade taxes [5] - Japan Airlines faced significant delays due to a pilot's pre-flight drinking, resulting in salary cuts for 37 executives, including the company president [9] Group 4 - Google Pay will be fully launched in Saudi Arabia, with Alipay expected to be integrated by 2026, enhancing digital payment options in the region [6] - The Chinese government plans to distribute over 330 million yuan in consumer subsidies during the upcoming National Day holiday, aiming to boost tourism and cultural consumption [6] - Peak Sports was reported to have implemented salary cuts across the board, with reductions reaching up to 50% for certain employees [7] Group 5 - China's bicycle and electric bicycle ownership has reached approximately 580 million, with significant reductions in carbon emissions attributed to two-wheeled transportation [12] - The U.S. stock market showed mixed results, with Baidu's shares rising over 11%, indicating positive market sentiment for certain Chinese tech stocks [13] Group 6 - NIO Inc. successfully raised $1.16 billion through a public stock offering, aimed at advancing its electric vehicle technology and infrastructure [14] - AI chip startup Groq completed a $750 million funding round, achieving a post-money valuation of $6.9 billion, reflecting strong investor interest in AI technology [14] - "Qingyun New Materials" announced the completion of a multi-billion C round financing to expand its production capacity and enhance its position in the high-end fiber materials market [14]
DeepSeek-R1开创历史,梁文锋论文登上《自然》封面
Di Yi Cai Jing· 2025-09-17 23:09
与今年1月发布的DeepSeek-R1的初版论文相比,本次论文披露了更多模型训练的细节,并正面回应了 模型发布之初的蒸馏质疑。 DeepSeek-R1也是全球首个经过同行评审的主流大语言模型。Nature评价道:目前几乎所有主流的大模 型都还没有经过独立同行评审,这一空白"终于被DeepSeek打破"。 本次论文正面回应了模型发布之初的蒸馏质疑。 由DeepSeek团队共同完成、梁文锋担任通讯作者的DeepSeek-R1推理模型研究论文,登上了国际权威期 刊《自然(Nature)》的封面。 ...
DeepSeek-R1开创历史 梁文锋论文登上《自然》封面
Di Yi Cai Jing· 2025-09-17 23:07
Core Insights - The DeepSeek-R1 inference model research paper, led by Liang Wenfeng, has been published on the cover of the prestigious journal "Nature" [1] - This paper provides more detailed information on model training compared to the initial version released in January, addressing concerns raised about the model's distillation [1] - DeepSeek-R1 is recognized as the first mainstream large language model to undergo peer review, breaking a significant gap in the field as noted by Nature [1]