蒸馏 - filings, earnings calls, financial reports, news

蒸馏

Search documents

蒸馏、GEO、氛围编程 2025年度“AI十大黑话” 能听懂几个？

3 6 Ke· 2025-12-26 09:16

2025年，AI领域的发展令人目不暇接，一系列新概念不断涌现并重塑着行业格局。《麻省理工科技评论》评出十大年度AI热词。读懂这些词，或许也就读懂了2025 年我们如何被AI改变。 1. 氛围编程三十年前，苹果联合创始人史蒂夫·乔布斯提出：每个人都应该学习编程。而今天，编程正在被重新定义。OpenAI联合创始人安德烈·卡帕西所提出的"氛围编程（Vibe Coding）"，它不是一种新语言，而是一种新方式：人只需用自然语言表达目标和逻辑，具体代码由AI自动完成。在氛围编程中，开发者只需要告诉AI：想做一个什么样的应用、需要哪些功能、整体的体验应该是什么感觉；AI则负责生成代码、调整细节，并在反复对话中不断迭代。 2. 推理模型 2025年，"推理"被反复提及，逐渐成为AI讨论中的核心词汇。这一热度背后，对应的是推理模型的崛起：这类大语言模型通过多步拆解与连续推演，开始处理更复杂的问题。自OpenAI发布o1和o3系列推理模型后，DeepSeek迅速跟进。如今，主流聊天机器人均已引入推理技术，在数学和编程竞赛中达到了顶尖人类专家水平。但关于AI是否真正具备"推理"能力，也再度引发了人们对智 ...

2025，AI圈都在聊什么？年度十大AI热词公布

3 6 Ke· 2025-12-26 07:33

量子位· 2025-11-11 11:11

Core Insights - The article discusses a groundbreaking paper that challenges the prevailing belief that reinforcement learning (RL) is essential for enhancing reasoning capabilities in large language models (LLMs), suggesting instead that model distillation may be more effective [1][5][12]. Group 1: Research Findings - The paper titled "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" received a perfect score at NeurIPS, indicating its significant impact [5][6]. - The research team from Tsinghua University and Shanghai Jiao Tong University found that RL primarily reinforces existing reasoning paths rather than discovering new ones, which contradicts the common assumption that RL can expand a model's reasoning capabilities [10][12]. - The study utilized the pass@k metric to evaluate model performance, revealing that RL models perform better at lower sampling rates but are outperformed by base models at higher sampling rates, indicating that the base model's reasoning abilities may be underestimated [14][20]. Group 2: Methodology - The research involved testing various models across three key application areas: mathematical reasoning, code generation, and visual reasoning, using authoritative benchmark datasets [17][19]. - The models compared included mainstream LLMs like Qwen2.5 and LLaMA-3.1, with RL models trained using algorithms such as PPO, GRPO, and Reinforce++ [18][19]. - The analysis focused on the differences in pass@k performance between RL and base models, as well as the trends in performance as sampling increased [21][22]. Group 3: Implications for the Industry - The findings suggest that the substantial investments and explorations surrounding RLVR may need to be reevaluated, as the actual benefits of RL in enhancing reasoning capabilities could be overestimated [4][12]. - The research highlights the potential of model distillation as a more promising approach for expanding reasoning capabilities in LLMs, which could shift industry focus and funding [10][12].

DeepSeek-R1\Kimi1.5及类强推理模型开发解读

Peking University· 2025-03-05 10:54

Investment Rating - The report does not explicitly state an investment rating for the industry or company Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report highlights the potential of RL-driven models to enhance reasoning abilities without relying on human-annotated supervised fine-tuning [21][56] Summary by Sections Technical Comparison - The report discusses the comparison between STaR-based methods and RL-based methods, emphasizing the advantages of RL in reasoning tasks [3] - It details the innovative RL algorithms used, such as GRPO, which optimize training efficiency and reduce computational costs [49][50] DeepSeek-R1 Analysis - DeepSeek-R1 Zero is built entirely on RL without supervised fine-tuning, showcasing its ability to develop reasoning capabilities autonomously [13][21] - The model's performance metrics indicate strong results in various benchmarks, including AIME 2024 and MATH-500, where it achieved 79.8% and 97.3% respectively, comparable to OpenAI's models [7][15] Insights and Takeaways - The report emphasizes the importance of a robust base model, DeepSeek-V3, which was trained on 671 billion parameters and 14.8 trillion high-quality tokens, enabling significant reasoning capabilities [45][56] - The use of rule-based rewards in training helps avoid reward hacking issues, allowing for automated verification and annotation of reasoning tasks [17][22] Future Directions - The report discusses the potential for further advancements in RL-driven models, suggesting that future training will increasingly focus on RL while still incorporating some supervised fine-tuning [56] - It highlights the need for models to maintain high reasoning performance while ensuring safety and usability in diverse applications [59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is expected to reshape industry dynamics, leading to increased competition and innovation [59] - The report notes that the capital market's volatility is a short-term phenomenon driven by rapid advancements in AI technology, which will lead to a long-term arms race in computational resources [59]

DeepSeek 刷新全球 AI 格局；50 美元模型蒸馏术；美国公司们宣布 8000 亿美元算力投资丨AI 月报

晚点LatePost· 2025-02-10 09:50

DeepSeek 在 1 月 20 日上线 R1 模型后，凭借高性能（比肩 OpenAI o1）、低使用成本（API 价格是 o1 的 1/30）、开源模型权重等，迅速接管 OpenAI 等公司主导的大模型叙事。 DeepSeek 怎么刷新全球大模型格局李飞飞在内的团队如何低成本 "蒸馏" 出特定领域追赶 o1 的模型到去年底，OpenAI 年化收入超 60 亿美元 OpenAI 的星门计划：投 5000 亿美元建算力 26 家获得超过 5000 万美元融资的 AI 公司，中国有 2 家大模型公司的爬虫遭 "下毒" 抵抗这之前，因为 OpenAI 展示能力超强的 o3 模型，不少 OpenAI 和硅谷的研究者正在讨论 AGI （通用人工智能）即将到来。R1 发布后，行业焦点变成 DeepSeek，一些媒体用 "DeepShock" 形容它带来的冲击。市值大跌的英伟达、台积电，现在已经开始反弹 2025 年 1 月的全球 AI 大事记。文丨贺乾明编辑丨程曼祺 2025 年 1 月的 AI 月报，你会看到：以下是我们第 3 期 AI 月报，欢迎大家在留言区补充我们没有提到的重要进展。格局丨D ...