清库存!DeepSeek突然补全R1技术报告,训练路径首次详细公开
量子位·2026-01-08 12:08

Core Insights - DeepSeek has released an updated version of its R1 paper, adding 64 pages of technical details, significantly enhancing the original content [2][5][56] - The new version emphasizes the implementation details and training processes of the R1 model, showcasing a systematic approach to its development [10][11][17] Summary by Sections Paper Updates - The updated paper has expanded from 22 pages to 86 pages, providing a wealth of new information that resembles a textbook [3][6] - The revisions include a comprehensive breakdown of the R1 training process, which is divided into four main steps: cold start, inference-guided reinforcement learning, rejection sampling and fine-tuning, and alignment-guided reinforcement learning [13][14][15][16] Model Performance and Safety - The R1 model has shown a significant increase in reasoning capabilities, with a reported 5 to 7 times increase in the occurrence of reflective vocabulary as training progresses [21][22] - DeepSeek has implemented a safety control system that includes a dataset of 106,000 prompts to evaluate and enhance the model's safety, using a point-wise training method for the safety reward model [26][29] - The introduction of the risk control system has led to a notable improvement in the model's safety performance, with R1 achieving benchmark scores comparable to leading models [32][33] Team Stability and Industry Context - The core team behind the R1 paper has remained stable, with 18 key contributors still part of DeepSeek, indicating a low turnover rate in contrast to industry trends [41][47] - The article contrasts DeepSeek's team retention with the challenges faced by other companies in the AI sector, highlighting a more cohesive internal culture [48][49]