Workflow
模型蒸馏
icon
Search documents
DeepSeek、月之暗面、MiniMax被点“非法提取”,它们做错了吗? | 电厂
Xin Lang Cai Jing· 2026-02-25 10:47
Core Viewpoint - Anthropic has accused three Chinese AI companies—DeepSeek, Moonshot, and MiniMax—of illicitly extracting data from its model Claude, marking the second controversy involving domestic models within three months [1][9]. Group 1: Allegations and Responses - Anthropic claims that the three Chinese companies used approximately 24,000 fraudulent accounts to interact with Claude over 16 million times, using these interactions to enhance their own models [1][4]. - The accused companies have remained silent regarding the allegations, with no public response from DeepSeek, MiniMax, or Moonshot [1]. - Anthropic's statement highlighted that the interaction patterns with Claude were abnormal, indicating intentional extraction of Claude's unique capabilities [7]. Group 2: Technical Aspects of Distillation - The technique used by the accused companies is known as "distillation," which allows models to learn from a "teacher model" like Claude by interacting with it [4][6]. - Distillation is a common method for rapidly evolving models, enabling smaller models to approximate the performance of larger ones with less data [6]. - Major AI companies, including OpenAI and Google, have included clauses in their usage agreements prohibiting distillation, reflecting a growing concern over intellectual property [9]. Group 3: Legal and Ethical Considerations - The ongoing debate over model distillation raises questions about legal definitions, including contract law, copyright law, and unfair competition [10]. - Both Chinese and American companies utilize vast amounts of internet data for training, leading to discussions about authorization and ethical use of such data [10]. - The narrative surrounding "Chinese companies distilling American models" has become a one-sided discourse, with the potential for a prolonged public relations battle [10]. Group 4: Open Source vs. Closed Source Models - Many leading Chinese models operate under open-source licenses that permit distillation, contrasting with the closed-source models that prohibit such practices [10][13]. - For instance, DeepSeek's models are released under the MIT license, allowing for academic and commercial use, while other models like MiniMax and Qwen3 follow the Apache 2.0 license [10]. - The controversy over distillation also highlights the ongoing debate between open-source and closed-source development paths in the AI industry [13].
Anthropic指控AI公司蒸馏剽窃,马斯克硬刚“贼喊抓贼”
Sou Hu Cai Jing· 2026-02-25 10:13
Core Viewpoint - Anthropic accuses three leading Chinese AI companies, DeepSeek, Moonshot, and MiniMax, of infringing on its Claude model capabilities through fraudulent accounts and proxy services, utilizing a technique known as "model distillation" to enhance their own models [3][4]. Group 1: Allegations of Model Theft - Anthropic claims that the Chinese AI companies used fraudulent accounts to access Claude, generating over 16 million interactions, which they argue violates service terms and access restrictions [3][4]. - The three companies are accused of employing similar methods to access Claude's capabilities, particularly focusing on agentic reasoning, tool usage, and coding abilities [4]. Group 2: Specific Interactions and Patterns - DeepSeek engaged in over 150,000 interactions, focusing on extracting Claude's reasoning capabilities across diverse tasks, indicating coordinated efforts to avoid detection [5]. - Moonshot AI recorded over 3.4 million interactions, targeting agentic reasoning, tool usage, and data analysis, aiming to reconstruct Claude's reasoning pathways [5]. - MiniMax had the largest scale with over 13 million interactions, specifically targeting agent coding and tool usage, demonstrating adaptability by redirecting traffic to capture new features [5]. Group 3: Legal and Ethical Implications - The allegations raise questions about the legality of model distillation and the ethical considerations surrounding AI training, as many large language models are trained on publicly available internet data without explicit consent from original authors [7][8]. - There is an ongoing debate regarding the ownership of synthetic data and compliance issues related to training, particularly for open-source models [8]. Group 4: National Security and Export Controls - Anthropic's accusations highlight concerns over national security, suggesting that illegal distillation could undermine U.S. control over advanced AI technology exports [9]. - Current U.S. export controls primarily focus on hardware rather than large language model API access, indicating a gap in regulatory measures [9]. Group 5: Developer Responsibilities and Compliance - Developers using large language models must ensure their training processes are secure and compliant, maintaining clear records of training data sources and adhering to service terms [10][11]. - Anthropic is investing in defensive technologies to detect "distillation attack" patterns and is implementing protective measures to reduce the effectiveness of illegal distillation while maintaining legitimate user experience [11].
上海楼市重磅新政,非沪籍大松绑;传飞天茅台出厂价涨130元;宝马降价27万上热搜;美国公司指控中企“偷”模型,马斯克嘲讽|| 大件事
Sou Hu Cai Jing· 2026-02-25 08:51
wumiancaijing.com / 2月25日,网上有消息称,飞天茅台出厂价由1169元/瓶上调至1299元/瓶,引发市场关注。 对此,据21世纪经济报道,贵州茅台公司接线工作人员回应:" 如果是飞天有价格有调整的话,是要披 露的重大事项,公司会及时披露信息。网上经常有这种小作文,建议关注公司披露的公告,没必要去相 信网上的这种谣传。" 最热的泛财经新闻,都在这儿了。 重要提醒!!!为防失联,请"星标"我们!进入无冕财经公众号,点击右上角"...",再" ",以便您及时 接收每篇推送~ 本文由无冕财经(wumiancaijing)整理发布 资讯整理:小冕 编辑:陈涧 设计:岚昇 传飞天茅台出厂价上调130元 茅台辟谣,分析称是股民行为 ▲茅台2023年调价时曾发布公告。 贵州茅台最近一次调整出厂价格,还是在2023年11月1日,当时上调53%vol贵州茅台酒(飞天、五星) 出厂价格,平均上调幅度约为20%。但市场指导价仍维持在每瓶1499元。 据中国基金报,对于调价消息,一位茅台酒经销商表示"没听说"。中国酒业独立评论人肖竹青表 示,"贵州国资背景的茅台酒代理商是茅台酒涨价的最大阻力,每一个贵州国资的代理 ...
美国AI公司指控中国偷技术?马斯克:你偷数据赔15亿时怎么不吭声
Sou Hu Cai Jing· 2026-02-25 03:55
Group 1 - The core issue revolves around accusations of "model distillation" theft by Anthropic against three Chinese companies, highlighting a double standard in the AI industry regarding technology ethics and commercial interests [3][5] - Anthropic claims that these Chinese companies used 24,000 fake accounts to interact with their Claude model 16 million times, alleging that this constitutes theft of core capabilities [3] - Elon Musk's response points out Anthropic's own history of data theft, having paid $1.5 billion in a copyright settlement, thus questioning the legitimacy of their accusations against others [3][5] Group 2 - Model distillation is described as a neutral technical method where a "teacher model" teaches a "student model" through interactions, raising questions about the legitimacy of labeling such practices as theft [5] - The controversy reflects a broader issue in the AI industry, where the lack of clear technical boundaries and ethical guidelines leads to accusations and counter-accusations among companies [9] - The timing of Anthropic's accusations suggests a political motive, as it aligns with U.S. efforts to impose export controls on AI chips, indicating a potential use of technology disputes as a justification for political actions [7] Group 3 - The AI industry is characterized by a significant lag in rules and ethics, with companies often operating in a "gray area" regarding data usage and technology imitation [9] - The need for clearer regulations is emphasized, suggesting that the industry should focus on establishing boundaries for data use and model distillation rather than engaging in mutual accusations [11] - The future of AI should prioritize collaborative rule-making over competitive blame, as the technology itself is neutral and its ethical implications depend on its users [11]
穷人福音,MIT研究:不用堆显卡,抄顶级模型作业就成
3 6 Ke· 2026-01-09 13:20
Core Insights - The study from MIT reveals that despite the diverse architectures of AI models, their understanding of matter converges as they become more powerful, suggesting a shared cognitive alignment towards physical truths [1][2][3] Group 1: Model Performance and Understanding - The research indicates that as AI models improve in predicting molecular energy, their cognitive approaches become increasingly similar, demonstrating a phenomenon known as representation alignment [3][5] - High-performance models, regardless of their structural differences, compress their feature space to capture essential physical information, indicating a convergence in understanding [5][6] Group 2: Cross-Architecture Alignment - The study highlights that models trained on different modalities, such as text and images, also show a tendency to align in their understanding of concepts, exemplified by the representation of "cats" [9][14] - This alignment suggests that powerful models, regardless of their input type, gravitate towards a unified internal representation of reality [14] Group 3: Implications for AI Development - The findings challenge the necessity of expensive computational resources for training large models, advocating for model distillation where smaller models can mimic the cognitive processes of larger, high-performance models [18][20] - The research emphasizes that the future of scientific AI will focus on achieving convergence in understanding rather than merely increasing model complexity, leading to more efficient and innovative AI solutions [22][24][25]
1.58bit不输FP16!微软推出全新模型蒸馏框架,作者全是华人
量子位· 2025-10-20 03:46
Core Insights - Microsoft has introduced a new distillation framework called BitNet Distillation (BitDistill), which achieves model quantization with minimal performance loss while reducing memory consumption to 1/10 of FP16 [1][6][22]. Group 1: Framework Overview - BitDistill has been validated on models with 4 billion parameters and below, such as Qwen and Gemma, and is theoretically applicable to other Transformer models [2]. - The framework consists of three interconnected stages: Model Refinement, Continue Pre-training, and Distillation-based Fine-tuning [8]. Group 2: Model Structure Optimization - The primary goal of model structure optimization is to support the training of 1.58-bit models and address optimization instability issues common in low-precision training [9]. - BitDistill introduces a normalization module called SubLN in each Transformer layer to enhance training stability by controlling the variance of activations [10][12]. Group 3: Continue Pre-training - A lightweight continue pre-training phase is designed to help the model gradually adapt its weights from full precision to a distribution suitable for 1.58-bit representation [14][15]. - This phase allows the model to "learn how to be quantized," preventing information loss during the fine-tuning stage [16]. Group 4: Distillation-based Fine-tuning - BitDistill employs a dual distillation mechanism—Logits distillation and multi-head attention distillation—to recover the performance of the quantized model [18]. - Logits distillation uses the probability distribution from the full precision model as "soft labels" to guide the quantized model [19]. Group 5: Performance Evaluation - BitDistill demonstrates performance nearly equivalent to full precision models across various downstream tasks while significantly reducing memory usage and improving inference speed [22]. - In text classification tasks, the 1.58-bit model achieved accuracy levels comparable to full precision fine-tuned models, outperforming directly quantized models [23][24]. - In text summarization tasks, BitDistill's generated text quality was nearly identical to that of full precision models, with slight improvements in BLEU scores [25][27]. Group 6: Generalizability and Compatibility - BitDistill has been successfully applied to other pre-trained models like Gemma and Qwen2.5, showing high fidelity in performance recovery [28]. - The framework is compatible with various quantization strategies, proving its utility as an independent distillation solution applicable to multiple post-quantization optimization scenarios [28].
真正的AI竞争力,藏在大模型“后训练”这一步
量子位· 2025-10-13 08:47
Core Insights - The article emphasizes the importance of Post-Training as a transformative approach in AI, moving beyond simple model optimization to creating specialized intelligent engines tailored to specific business needs [1][4] - The evolution of Post-Training technology is highlighted, showcasing a shift from Supervised Fine-Tuning (SFT) to Reinforcement Learning (RL) methodologies, which better align with complex business requirements [2][4] Summary by Sections Post-Training Evolution - The initial approach in the industry was SFT, which allowed models to learn specific domain knowledge and dialogue styles [2] - However, SFT was insufficient for teaching models complex value judgments and strategic choices, which are critical in real business scenarios [3] - The focus has shifted to RL, evolving from human-dependent methods (RLHF) to automated systems (RLVR) and the innovative use of Natural Language Rewards [4][5] Implementation Pathway - The article outlines a four-step pathway for enterprises to implement Post-Training effectively, addressing challenges such as data quality, high labeling costs, and defining reward signals [5][8] - Successful case studies from companies like Zhihu, AutoHome, and Weibo illustrate practical applications of these steps, showcasing improvements in data quality and model performance [7][8] Step 1: Data Preparation - High-quality data is identified as the cornerstone of successful Post-Training, with companies spending 60-70% of their time on data preparation [10] - Zhihu and AutoHome have developed methods to enhance data quality through pre-labeling and structured data utilization, respectively [11][13] Step 2: Model Selection - Choosing the right base model is crucial, with many companies opting for the Tongyi Qianwen series due to its performance and support for Post-Training [14][16] - The model's architecture and open-source ecosystem facilitate easier implementation of Post-Training techniques [15][18] Step 3: Reward Mechanism Design - The design of a reward mechanism is essential for aligning model outputs with business objectives, transitioning from human feedback to automated verification systems [24][25] - Companies like Yingmi Fund are exploring ways to integrate expert decision-making frameworks into their models to enhance performance [26] Step 4: Evaluation System - A robust evaluation system is necessary to measure the effectiveness of Post-Training, with Yingmi Fund developing benchmarks to assess model performance in real-world scenarios [27][28] - Successful implementations have led to significant improvements in model accuracy and business outcomes, as seen in the case of Baifeng Cloud and Quark [30][32] Conclusion - The article concludes that the true competitive advantage in AI lies in how companies leverage their unique data and business insights through Post-Training to create proprietary intelligent engines [32]
前谷歌 CEO 施密特:AI 像电与火,这 10 年决定未来 100 年
3 6 Ke· 2025-09-24 01:27
Group 1 - The core insight is that AI is transitioning from a tool for efficiency to a fundamental infrastructure that redefines business operations, akin to the invention of electricity and fire [3][5][9] - Eric Schmidt emphasizes that the next decade will determine the future landscape of AI, focusing on how organizations must adapt to an AI-native operational model [8][47] - The discussion highlights that the real competition lies in building a comprehensive system to support AI rather than just improving model performance [2][6] Group 2 - A significant limitation to AI development is not technological parameters but rather the supply of electricity, with a projected need for an additional 92GW of power in the U.S. by 2030 to support data centers [11][12][18] - The cost of AI training is primarily driven by electricity consumption and operational time, making energy supply a critical bottleneck for AI deployment [16][17] - The future battleground for AI will shift from laboratories to power generation facilities, as insufficient energy supply will hinder the application of advanced models [19][18] Group 3 - The ability to effectively integrate and utilize advanced chips is crucial, as simply acquiring GPUs is not enough; operational efficiency and collaboration among components are key [20][21][22] - The construction of AI systems requires a multifaceted approach, including hardware, software, cooling, and engineering capabilities, to ensure sustainable operation [22][24][25] - Companies like Nvidia are evolving from chip suppliers to comprehensive solution providers, indicating a trend towards integrated AI infrastructure [26] Group 4 - The trend of model distillation allows for the replication of AI capabilities at a lower cost, raising concerns about the control and regulation of powerful models [29][34][35] - As AI capabilities become more accessible, the focus shifts from merely creating advanced models to ensuring their stable and effective operation [31][39] - The competitive landscape is evolving, with success hinging on the ability to create platforms that improve with use, rather than just delivering one-time products [40][46] Group 5 - The future of AI companies will depend on their ability to build platforms that continuously learn and adapt, creating a cycle of improvement and user dependency [40][44][46] - Eric Schmidt warns that the next decade will be crucial for determining who can effectively transition AI from experimental phases to practical applications [47][49] - The race to establish a closed-loop system for AI deployment is already underway, with the potential to shape the future of the industry [50]
核心模型被曝蒸馏DeepSeek?前女友一纸控诉,曝出欧版OpenAI塌房真相
3 6 Ke· 2025-08-18 12:12
Core Viewpoint - Mistral AI, once hailed as "Europe's OpenAI," is embroiled in a scandal involving allegations of plagiarism, specifically that its core technology is derived from DeepSeek, misleadingly presented as an original RL achievement [1][3][21]. Group 1: Allegations and Scandal - A former female employee of Mistral revealed in a personal letter that the company distilled DeepSeek's technology and misrepresented it as their own, using OpenAI's data while distorting benchmark results [3][4][21]. - The scandal gained traction online, with notable figures in the AI community, such as DeepMind researcher Susan Zhang, publicly condemning Mistral's unethical practices [4][21]. - The former employee expressed her frustrations about being sidelined and ignored when she raised concerns about the company's practices, leading to her eventual dismissal [6][7]. Group 2: Technical Comparisons - An industry insider, Sam Paech, had previously noted similarities between Mistral's Small 3.2 model and DeepSeek, suggesting that Mistral's outputs closely mirrored those of DeepSeek [9][10]. - Further analysis revealed that Mistral-small-3.2 and DeepSeek-v3 exhibited strikingly similar characteristics, indicating a lack of originality in Mistral's model [12][21]. Group 3: Historical Context and Achievements - Mistral AI was once celebrated for its rapid rise, achieving a valuation of $6.2 billion within just over a year of its establishment, positioning itself as a significant player in the European AI landscape [24][34]. - The company had previously launched successful products, including the Le Chat application, which topped the charts in France, and was supported by French President Macron as a key player in the national AI strategy [26][28][34].
被曝蒸馏DeepSeek还造假!欧版OpenAI塌房了
猿大侠· 2025-08-15 04:11
Core Viewpoint - Mistral, a prominent player in the open-source AI sector, is accused of distilling its latest model from DeepSeek, misleading the public about its model's performance and testing results [3][22][24]. Group 1: Allegations and Evidence - A former employee of Mistral revealed through a mass email that the company's latest model may have directly distilled from DeepSeek, misrepresenting it as a successful reinforcement learning case [2][3]. - Analysis by Twitter user Sam Peach indicated a surprising similarity between Mistral-small-3.2 and DeepSeek-v3, suggesting that the resemblance is likely a result of distillation rather than coincidence [7][14]. - The analysis involved identifying overused words and n-grams in the models' outputs, leading to a similarity map that showed Mistral-small-3.2 and DeepSeek-v3 were closely positioned, indicating high output similarity [16][18]. Group 2: Company Background and Market Position - Mistral, founded in 2023 and based in Paris, is often referred to as the European version of OpenAI, co-founded by former Google DeepMind and Meta employees [24]. - The company has gained significant attention, with a valuation reaching $10 billion and plans for a new funding round of $1 billion, following a previous round that raised €600 million (approximately $645 million) [25]. - Mistral has maintained an open-source approach, releasing models like Mistral Small and Mistral Code, and has developed a chatbot named LeChat to compete with ChatGPT [27][28].