Seek .-毫无征兆，DeepSeek R1爆更86页论文，这才是真正的Open

Core Insights - DeepSeek has significantly updated its R1 paper from 22 pages to 86 pages, demonstrating that open-source models can compete with closed-source ones and even teach them new methodologies [1][2][4] - The updated paper serves as a fully reproducible technical report for the open-source community, showcasing the advancements made in AI reasoning capabilities through reinforcement learning [2][4] Summary by Sections Paper Update and Content - The R1 paper now includes precise data specifications, detailing a dataset of 26,000 math problems and 17,000 code samples, along with the creation process [4] - Infrastructure details are provided, including a diagram of the vLLM/DualPipe setup [4] - The training cost is broken down, totaling approximately $294,000, with R1-Zero utilizing 198 hours of H800 GPU [4][24] - A retrospective on failed attempts is included, explaining why the Process Reward Model (PRM) did not succeed [4] - A comprehensive safety report of 10 pages outlines safety assessments and risk analyses [4] Performance Comparison - DeepSeek R1's performance is comparable to OpenAI's o1, even surpassing o1-mini, GPT-4o, and Claude 3.5 in several metrics [5][10] - In educational benchmarks like MMLU and GPQA Diamond, R1 outperforms previous models, particularly excelling in STEM-related questions due to reinforcement learning [10][12] - R1's performance in long-context question-answering tasks is notably strong, indicating excellent document understanding and analysis capabilities [10] Reinforcement Learning and Distillation - The paper discusses the effectiveness of distilling reasoning capabilities from larger models to smaller ones, confirming that learned reasoning can be transferred without re-exploring the reward space [20][22] - The training data distribution for reinforcement learning includes 26,000 math problems, 17,000 code samples, and 66,000 general knowledge tasks [19] Safety and Risk Assessment - DeepSeek R1's safety evaluation includes a risk control system that filters potential risk dialogues and assesses model responses against predefined keywords [31][32] - The model's performance in safety benchmarks is comparable to other advanced models, although it shows weaknesses in handling intellectual property issues [35][37] - A multi-language safety testing dataset has been developed, demonstrating R1's safety performance across 50 languages [42] Conclusion - The advancements made by DeepSeek R1 represent a significant milestone in open-source AI, showcasing competitive performance against proprietary models while maintaining lower operational costs [17][18]