蒸馏
Search documents
让 Anthropic 破防的「蒸馏」风波,美国 AI 大牛泼冷水:中国 AI 成功不靠走捷径
Xin Lang Cai Jing· 2026-02-26 02:15
Anthropic 昨天点名 DeepSeek、月之暗面、MiniMax 三家中国 AI 实验室「蒸馏」Claude 模型,全网炸 锅。 对于此事件,RLHF (基于人类反馈的强化学习)领域最知名的研究者之一,《RLHF》一书的作者 Nathan Lambert 指出,这件事没有人们想象的那么严重,但也没有那么简单。 他认为,中国 AI 公司的基础设施非常好,取得了很多创新,也在攻克各种技术难题,但它们取得这样 的结果,靠的并不是「走捷径」。 在讨论蒸馏这件事之前,先看看 Lambert 的话为什么值得听。 Nathan Lambert 是 Allen AI 研究所的科学家,博士毕业于加州大学伯克利分校,师从机器人领域的著名 学者 Pieter Abbeel。他并非 RLHF 技术的发明者,但他写的《RLHF》这本开源书籍,如今是 AI 从业者 理解大模型训练流程的标准参考材料之一。 和到处都是的 AI 网红不一样,他是真正上手训练过大模型的人。 在 Anthropic 博客发出的当天,Lambert 就发布了一篇详细分析文章《蒸馏对于中国大模型到底有多重 要?》。他的核心论点,和主流媒体的解读方向截然不同, ...
Anthropic声称被Deepseek蒸馏!马斯克为啥怼?
Xin Lang Cai Jing· 2026-02-24 07:57
Core Viewpoint - Anthropic has accused three Chinese AI companies—DeepSeek, Moonshot AI, and MiniMax—of large-scale "distillation" of its model Claude, claiming that these companies used over 24,000 fake accounts to interact with Claude approximately 16 million times to extract model capabilities for their own models [1][3][16]. Group 1: Distillation Process - Distillation is a common AI training method where a stronger "teacher model" generates output data to train a "student model," allowing for the replication of some capabilities at a lower cost and parameter scale [2][14]. - The controversy centers on the scale and method of distillation, with Anthropic alleging that the three companies systematically extracted Claude's capabilities through shared payment methods, proxy services, and bulk request structures [3][16]. Group 2: Specific Interactions - DeepSeek is accused of over 150,000 interactions focusing on reasoning and thought chain data; Moonshot AI is reported to have around 3.4 million interactions targeting agent capabilities and tool invocation; MiniMax had the highest number, approximately 13 million interactions, concentrating on agent orchestration and tool usage [3][16]. Group 3: Industry Reactions - Elon Musk criticized Anthropic on social media, suggesting that the company has previously faced controversies regarding training data and implying hypocrisy in their accusations [3][19]. - There are differing opinions within the industry regarding the focus of the controversy, with some arguing that the issue lies not in the distillation technology itself but in the specific implementation methods that may violate service terms or regional restrictions [21][22]. Group 4: Legal and Ethical Considerations - The lack of clear legal standards regarding the ownership of model outputs raises questions about whether the actions of the accused companies constitute normal competition or unfair extraction [23][24]. - The ongoing debate highlights the need for clearer definitions of what constitutes reasonable use versus systematic capability extraction in the context of AI model training [24].
蒸馏、GEO、氛围编程 2025年度“AI十大黑话” 能听懂几个?
3 6 Ke· 2025-12-26 09:16
Core Insights - The article discusses the rapid development of AI in 2025, highlighting ten key terms that reflect how AI is reshaping industries and society. Group 1: AI Concepts - Vibe Coding redefines programming by allowing developers to express goals in natural language, with AI generating the necessary code [2] - Reasoning models have emerged as a core focus in AI discussions, enabling complex problem-solving through multi-step reasoning [3] - World Models aim to enhance AI's understanding of real-world causality and physical laws, moving beyond mere language processing [4] Group 2: Infrastructure and Investment - The demand for AI has led to the construction of super data centers, exemplified by OpenAI's $500 billion "Stargate" project, raising concerns about energy consumption and local impacts [5] - The AI sector is experiencing a capital influx, with companies like OpenAI and Anthropic seeing rising valuations, though many are still in the high-investment phase without stable profit models [6] Group 3: AI Challenges and Trends - The term "intelligent agents" is popular in AI marketing, but there is no consensus on what constitutes true intelligent behavior [7] - Distillation technology allows smaller models to learn from larger ones, achieving high performance at lower costs [8] - The concept of "AI garbage" reflects public concern over the quality and authenticity of AI-generated content [9] Group 4: AI in Real-World Applications - Physical intelligence remains a significant challenge for AI, as robots still require human intervention for complex tasks [10] - The shift from traditional SEO to Generative Engine Optimization (GEO) indicates a change in how brands and content creators engage with AI-driven information retrieval [11]
2025,AI圈都在聊什么?年度十大AI热词公布
3 6 Ke· 2025-12-26 07:33
Core Insights - The development of AI in 2025 is marked by emerging concepts that are reshaping the industry landscape, as highlighted by the "MIT Technology Review" which identifies the top ten AI buzzwords of the year [1] Group 1: Emerging Concepts in AI - Vibe Coding redefines programming by allowing developers to express goals and logic in natural language, with AI generating the corresponding code [2] - Reasoning models have gained prominence, enabling AI to tackle complex problems through multi-step reasoning, with major advancements from OpenAI and DeepSeek [3] - World models aim to enhance AI's understanding of real-world causal relationships and physical laws, moving beyond mere language processing [4] Group 2: Infrastructure and Economic Implications - The demand for AI has led to the construction of super data centers, exemplified by OpenAI's $500 billion "Stargate" project, raising concerns about energy consumption and local community impacts [5] - The AI sector is experiencing a capital influx, with companies like OpenAI and Anthropic seeing rising valuations, although many are still in the high-investment phase without stable profit models [6] Group 3: Quality and Standards in AI - The term "intelligent agents" is widely used in AI marketing, but there is no consensus on what constitutes true intelligent behavior, highlighting a lack of industry standards [7] - Distillation technology allows smaller models to learn from larger ones, achieving high performance at lower costs, indicating that effective algorithms can drive AI advancements [8] Group 4: Content Quality and User Interaction - "AI garbage" refers to low-quality AI-generated content, reflecting public concerns about the authenticity and quality of information in the AI era [9] - Physical intelligence remains a challenge for AI, as robots still require human intervention for complex tasks, indicating a long road ahead for AI to fully understand and adapt to the physical world [10] - The shift from traditional SEO to Generative Engine Optimization (GEO) signifies a change in how brands and content creators engage with AI, emphasizing the importance of being referenced by AI in responses [11]
6666!NuerIPS满分论文来了
量子位· 2025-11-11 11:11
Core Insights - The article discusses a groundbreaking paper that challenges the prevailing belief that reinforcement learning (RL) is essential for enhancing reasoning capabilities in large language models (LLMs), suggesting instead that model distillation may be more effective [1][5][12]. Group 1: Research Findings - The paper titled "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" received a perfect score at NeurIPS, indicating its significant impact [5][6]. - The research team from Tsinghua University and Shanghai Jiao Tong University found that RL primarily reinforces existing reasoning paths rather than discovering new ones, which contradicts the common assumption that RL can expand a model's reasoning capabilities [10][12]. - The study utilized the pass@k metric to evaluate model performance, revealing that RL models perform better at lower sampling rates but are outperformed by base models at higher sampling rates, indicating that the base model's reasoning abilities may be underestimated [14][20]. Group 2: Methodology - The research involved testing various models across three key application areas: mathematical reasoning, code generation, and visual reasoning, using authoritative benchmark datasets [17][19]. - The models compared included mainstream LLMs like Qwen2.5 and LLaMA-3.1, with RL models trained using algorithms such as PPO, GRPO, and Reinforce++ [18][19]. - The analysis focused on the differences in pass@k performance between RL and base models, as well as the trends in performance as sampling increased [21][22]. Group 3: Implications for the Industry - The findings suggest that the substantial investments and explorations surrounding RLVR may need to be reevaluated, as the actual benefits of RL in enhancing reasoning capabilities could be overestimated [4][12]. - The research highlights the potential of model distillation as a more promising approach for expanding reasoning capabilities in LLMs, which could shift industry focus and funding [10][12].
DeepSeek-R1\Kimi1.5及类强推理模型开发解读
Peking University· 2025-03-05 10:54
Investment Rating - The report does not explicitly state an investment rating for the industry or company Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report highlights the potential of RL-driven models to enhance reasoning abilities without relying on human-annotated supervised fine-tuning [21][56] Summary by Sections Technical Comparison - The report discusses the comparison between STaR-based methods and RL-based methods, emphasizing the advantages of RL in reasoning tasks [3] - It details the innovative RL algorithms used, such as GRPO, which optimize training efficiency and reduce computational costs [49][50] DeepSeek-R1 Analysis - DeepSeek-R1 Zero is built entirely on RL without supervised fine-tuning, showcasing its ability to develop reasoning capabilities autonomously [13][21] - The model's performance metrics indicate strong results in various benchmarks, including AIME 2024 and MATH-500, where it achieved 79.8% and 97.3% respectively, comparable to OpenAI's models [7][15] Insights and Takeaways - The report emphasizes the importance of a robust base model, DeepSeek-V3, which was trained on 671 billion parameters and 14.8 trillion high-quality tokens, enabling significant reasoning capabilities [45][56] - The use of rule-based rewards in training helps avoid reward hacking issues, allowing for automated verification and annotation of reasoning tasks [17][22] Future Directions - The report discusses the potential for further advancements in RL-driven models, suggesting that future training will increasingly focus on RL while still incorporating some supervised fine-tuning [56] - It highlights the need for models to maintain high reasoning performance while ensuring safety and usability in diverse applications [59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is expected to reshape industry dynamics, leading to increased competition and innovation [59] - The report notes that the capital market's volatility is a short-term phenomenon driven by rapid advancements in AI technology, which will lead to a long-term arms race in computational resources [59]
2025年DeepSeek-R1&Kimi 1.5及类强推理模型开发解读报告
Peking University· 2025-03-04 01:35
Investment Rating - The report does not explicitly provide an investment rating for the industry or company discussed Core Insights - DeepSeek-R1 introduces a new paradigm of strong reasoning under reinforcement learning (RL), showcasing significant advancements in reasoning capabilities and long-text processing [4][7] - The model demonstrates exceptional performance in complex tasks, marking a milestone in the open-source community's competition with closed-source models like OpenAI's o1 series [7] - The report emphasizes the importance of RL in enhancing model capabilities, particularly in mathematical reasoning and coding tasks, with DeepSeek-R1 achieving notable scores in various benchmarks [7][59] Summary by Sections Technical Comparison - The report discusses the technical advancements of DeepSeek-R1, including its architecture and the innovative RL algorithms employed, such as GRPO [3][4] - A comparison of performance metrics against other models, highlighting DeepSeek-R1's superior capabilities in various reasoning tasks [6] Insights and Takeaways - The model's ability to self-iterate and enhance its reasoning capabilities through RL is emphasized, showcasing its potential for autonomous learning without reliance on supervised fine-tuning [21][56] - The report outlines the significance of rule-based rewards in the training process, which helps avoid reward hacking issues commonly faced in traditional RL setups [16][23] Future Directions - The report suggests future exploration in enhancing model safety and usability, particularly in generating coherent and clear reasoning outputs [30][59] - It highlights the potential for further advancements in multi-modal reasoning and the integration of synthetic data to overcome data reproduction challenges [30][59] Economic and Social Benefits - The exploration of low-cost, high-quality language models is discussed, emphasizing the shift from model size to computational resources and synthetic data in expanding capabilities [59] - The report notes the potential for increased market activity and innovation driven by accessible AI technologies, which could lead to a more diverse application landscape [59]
DeepSeek 刷新全球 AI 格局;50 美元模型蒸馏术;美国公司们宣布 8000 亿美元算力投资丨AI 月报
晚点LatePost· 2025-02-10 09:50
DeepSeek 在 1 月 20 日上线 R1 模型后,凭借高性能(比肩 OpenAI o1)、低使用成本(API 价格是 o1 的 1/30)、开源模型权重 等,迅速接管 OpenAI 等公司主导的大模型叙事。 DeepSeek 怎么刷新全球大模型格局 李飞飞在内的团队如何低成本 "蒸馏" 出特定领域追赶 o1 的模型 到去年底,OpenAI 年化收入超 60 亿美元 OpenAI 的星门计划:投 5000 亿美元建算力 26 家获得超过 5000 万美元融资的 AI 公司,中国有 2 家 大模型公司的爬虫遭 "下毒" 抵抗 这之前,因为 OpenAI 展示能力超强的 o3 模型,不少 OpenAI 和硅谷的研究者正在讨论 AGI (通用人工智能)即将到来。R1 发 布后,行业焦点变成 DeepSeek,一些媒体用 "DeepShock" 形容它带来的冲击。 市值大跌的英伟达、台积电,现在已经开始反弹 2025 年 1 月的全球 AI 大事记。 文丨贺乾明 编辑丨程曼祺 2025 年 1 月的 AI 月报,你会看到: 以下是我们第 3 期 AI 月报,欢迎大家在留言区补充我们没有提到的重要进展。 格局丨D ...