Workflow
蒸馏
icon
Search documents
DeepSeek-R1\Kimi1.5及类强推理模型开发解读
Peking University· 2025-03-05 10:54
Investment Rating - The report does not explicitly state an investment rating for the industry or company Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report highlights the potential of RL-driven models to enhance reasoning abilities without relying on human-annotated supervised fine-tuning [21][56] Summary by Sections Technical Comparison - The report discusses the comparison between STaR-based methods and RL-based methods, emphasizing the advantages of RL in reasoning tasks [3] - It details the innovative RL algorithms used, such as GRPO, which optimize training efficiency and reduce computational costs [49][50] DeepSeek-R1 Analysis - DeepSeek-R1 Zero is built entirely on RL without supervised fine-tuning, showcasing its ability to develop reasoning capabilities autonomously [13][21] - The model's performance metrics indicate strong results in various benchmarks, including AIME 2024 and MATH-500, where it achieved 79.8% and 97.3% respectively, comparable to OpenAI's models [7][15] Insights and Takeaways - The report emphasizes the importance of a robust base model, DeepSeek-V3, which was trained on 671 billion parameters and 14.8 trillion high-quality tokens, enabling significant reasoning capabilities [45][56] - The use of rule-based rewards in training helps avoid reward hacking issues, allowing for automated verification and annotation of reasoning tasks [17][22] Future Directions - The report discusses the potential for further advancements in RL-driven models, suggesting that future training will increasingly focus on RL while still incorporating some supervised fine-tuning [56] - It highlights the need for models to maintain high reasoning performance while ensuring safety and usability in diverse applications [59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is expected to reshape industry dynamics, leading to increased competition and innovation [59] - The report notes that the capital market's volatility is a short-term phenomenon driven by rapid advancements in AI technology, which will lead to a long-term arms race in computational resources [59]
DeepSeek 刷新全球 AI 格局;50 美元模型蒸馏术;美国公司们宣布 8000 亿美元算力投资丨AI 月报
晚点LatePost· 2025-02-10 09:50
DeepSeek 在 1 月 20 日上线 R1 模型后,凭借高性能(比肩 OpenAI o1)、低使用成本(API 价格是 o1 的 1/30)、开源模型权重 等,迅速接管 OpenAI 等公司主导的大模型叙事。 DeepSeek 怎么刷新全球大模型格局 李飞飞在内的团队如何低成本 "蒸馏" 出特定领域追赶 o1 的模型 到去年底,OpenAI 年化收入超 60 亿美元 OpenAI 的星门计划:投 5000 亿美元建算力 26 家获得超过 5000 万美元融资的 AI 公司,中国有 2 家 大模型公司的爬虫遭 "下毒" 抵抗 这之前,因为 OpenAI 展示能力超强的 o3 模型,不少 OpenAI 和硅谷的研究者正在讨论 AGI (通用人工智能)即将到来。R1 发 布后,行业焦点变成 DeepSeek,一些媒体用 "DeepShock" 形容它带来的冲击。 市值大跌的英伟达、台积电,现在已经开始反弹 2025 年 1 月的全球 AI 大事记。 文丨贺乾明 编辑丨程曼祺 2025 年 1 月的 AI 月报,你会看到: 以下是我们第 3 期 AI 月报,欢迎大家在留言区补充我们没有提到的重要进展。 格局丨D ...