梁文锋发表Nature封面论文：揭开DeepSeek-R1背后的科学原理——强化学习激励大模型推理能力

Core Viewpoint - The article discusses the development and capabilities of DeepSeek-R1, a reasoning model that significantly reduces computational costs while enhancing reasoning abilities in large language models (LLMs) through pure reinforcement learning [1][2]. Group 1: Model Development and Training - DeepSeek-R1 was launched by a startup in Hangzhou, China, on January 20, 2025, and has gained global attention for its strong reasoning capabilities and low computational requirements [1]. - The training cost for DeepSeek-R1 was only $294,000, which is significantly lower than similar models that often cost tens of millions [2]. - The model employs a pure reinforcement learning approach, minimizing reliance on human-annotated reasoning paths, which allows for more autonomous exploration of reasoning capabilities [6][10]. Group 2: Performance and Capabilities - DeepSeek-R1-Zero, a precursor to DeepSeek-R1, demonstrated remarkable performance improvements in reasoning tasks, achieving an average pass@1 score of 77.9% in the American Mathematics Invitational Exam (AIME) 2024, up from 15.6% [17]. - The model also excelled in programming competitions and graduate-level problems in biology, physics, and chemistry, showcasing its versatility [19]. - The research indicates that advanced reasoning behaviors, such as self-validation and reflection, emerged organically during the reinforcement learning process [29]. Group 3: Challenges and Limitations - Despite its strengths, DeepSeek-R1-Zero faces challenges such as poor readability and language mixing issues, particularly when responding in both English and Chinese [21]. - The model's performance in broader domains like writing and open-domain Q&A remains limited due to its focus on reasoning tasks during training [22]. - The article highlights potential ethical risks associated with enhanced reasoning capabilities, including vulnerability to jailbreak attacks and the generation of dangerous content [27][28].