Workflow
蒸馏技术
icon
Search documents
姚顺宇离职背后:国产大模型已经上桌了
虎嗅APP· 2025-10-09 23:56
以下文章来源于凤凰网科技 ,作者凤凰网科技 凤凰网科技 . 凤凰科技频道官方账号,带你直击真相。 本文来自微信公众号: 凤凰网科技 ,作者:赵子坤,编辑:董雨晴,原文标题:《华人AI大神霸气 离职,一篇博客挑明中美大模型暗战》,题图来自:AI生成 近日,清华物理系传奇特奖得主 Yao Shunyu (姚顺宇) 已离开Anthropic,加入 Google DeepMind。 从2024年10月加入,到2025年9月离开,入职仅一年,姚顺宇为何要离开? 他在个人博客中提及,40%的原因是反对Anthropic最新发言中将中国称为"敌对国家",另外60%因 素源于无法公开的内部信息判断。 在海外的华人大拿里,有几个知名的"Yao Shunyu"。 一方面,达里奥的"贬损"源于对自身技术路线的维护:DeepSeek在推理模型上的创新对Anthropic坚 持的Scaling Law (缩放定律) 和预训练模型主导的技术路径构成了挑战。 前述所提及的,是物理学出身的姚顺宇,2024年毕业到加州伯克利做了几个月博士后,于当年10月 加入了Anthropic,从量子计算的研究正式转向了人工智能。在 Anthropic期间 ...
姚顺宇离职背后:国产大模型已经上桌了
Hu Xiu· 2025-10-09 13:19
Core Viewpoint - Yao Shunyu has left Anthropic to join Google DeepMind, citing opposition to Anthropic's stance on China as a "hostile nation" and undisclosed internal information as reasons for his departure [2][5]. Group 1: Departure Reasons - Yao Shunyu's departure from Anthropic is attributed to 40% opposition to the company's recent statements labeling China as a "hostile nation" and 60% to undisclosed internal information [2]. - Anthropic has increasingly adopted an anti-China stance, which Yao explicitly mentioned in his blog [5]. Group 2: Anthropic's Business Strategy - Since 2025, Anthropic has been expanding its business while explicitly excluding Chinese capital and markets from its official policies [6]. - On September 5, Anthropic announced a halt to services for companies with majority Chinese ownership, directly impacting subsidiaries in regions like Singapore and Hong Kong [7][8]. - Anthropic completed a $13 billion Series F funding round, tripling its valuation to $183 billion in just six months [9]. Group 3: Competitive Landscape - In response to Anthropic's service restrictions, several Chinese AI companies are seizing the opportunity to offer alternatives, leading to a competitive "technology cold war" [20]. - Major Chinese players, including Alibaba and DeepSeek, are rapidly enhancing their models and services to attract former Claude users [21][23]. - AWS has begun offering competing models from Alibaba and DeepSeek, indicating a shift in the competitive dynamics of the AI market [28][29].
DeepSeek首度公开R1模型训练成本仅为29.4万美元,“美国同行开始质疑自己的战略”
Xin Lang Cai Jing· 2025-09-19 13:25
Core Insights - DeepSeek has achieved a significant breakthrough in AI model training costs, with the DeepSeek-R1 model costing only $294,000 to train, which is substantially lower than the costs reported by American competitors [1][2][4] - The model's training utilized 512 NVIDIA H800 chips, and the total training time was 80 hours, marking it as the first mainstream large language model to undergo peer review [2][4] - The cost efficiency of DeepSeek's model has sparked discussions about China's position in the global AI landscape, challenging the notion that only countries with the most advanced chips can dominate the AI race [1][2] Cost Efficiency - The training cost of DeepSeek-R1 is reported at $294,000, while OpenAI's CEO indicated that their foundational model training costs exceed $100 million [2] - DeepSeek's approach emphasizes using a large amount of free data for pre-training and fine-tuning with self-generated data, which has been recognized as a cost-effective strategy [5][6] Response to Criticism - DeepSeek addressed accusations from U.S. officials regarding the alleged illegal acquisition of advanced chips, clarifying that they used legally procured H800 chips and acknowledging prior use of A100 chips for smaller model experiments [4][5] - The company defended its use of "distillation" technology, which is a common practice in AI, asserting that it enhances model performance while reducing costs [5][6] Competitive Landscape - The success of DeepSeek-R1 demonstrates that AI competition is shifting from merely having the most GPUs to achieving more with fewer resources, thus altering the competitive dynamics in the industry [6][7] - Other AI models, such as OpenAI's GPT-4 and Google's Gemini, still hold advantages in certain areas, but DeepSeek's model has set a new standard for cost-effective high-performance AI [6][7]
“训练成本才这么点?美国同行陷入自我怀疑”
Guan Cha Zhe Wang· 2025-09-19 11:28
Core Insights - DeepSeek has achieved a significant breakthrough in AI model training costs, with the DeepSeek-R1 model's training cost reported at only $294,000, which is substantially lower than the costs disclosed by American competitors [1][2][4] - The model utilizes 512 NVIDIA H800 chips and has been recognized as the first mainstream large language model to undergo peer review, marking a notable advancement in the field [2][4] - The cost efficiency of DeepSeek's model challenges the notion that only countries with the most advanced chips can dominate the AI race, as highlighted by various media outlets [1][2][6] Cost and Performance - The training cost of DeepSeek-R1 is significantly lower than that of OpenAI's models, which have been reported to exceed $100 million [2][4] - DeepSeek's approach emphasizes the use of open-source data and efficient training methods, allowing for high performance at a fraction of the cost compared to traditional models [5][6] Industry Impact - The success of DeepSeek-R1 is seen as a potential game-changer in the AI landscape, suggesting that AI competition is shifting from resource quantity to resource efficiency [6][7] - The model's development has sparked discussions regarding China's position in the global AI sector, particularly in light of U.S. export restrictions on advanced chips [1][4] Technical Details - The latest research paper provides more detailed insights into the training process and acknowledges the use of A100 chips in earlier stages, although the final model was trained exclusively on H800 chips [4][5] - DeepSeek has defended its use of "distillation" techniques, which are common in the industry, to enhance model performance while reducing costs [5][6]
“AI 教父”Geoffrey Hinton 首度在华演讲:AI 恰似一只小虎崽,而人类本身是大语言模型?
AI前线· 2025-07-27 04:30
Core Viewpoint - Geoffrey Hinton emphasizes the potential of AI to surpass human intelligence and the necessity for global cooperation to ensure AI remains beneficial to humanity [3][14][17] Group 1: AI and Human Intelligence - Hinton compares human cognition to large language models, suggesting that both can produce "hallucinations," but AI can transmit knowledge more efficiently through shared parameters [3][9] - The relationship between humans and AI is likened to raising a tiger cub, where the challenge lies in ensuring AI does not become a threat as it matures [14][17] - Hinton argues that AI can significantly enhance efficiency across various industries, making its elimination impractical [3][14] Group 2: AI Development Paradigms - Hinton discusses two paradigms of AI: logical reasoning and biological learning, highlighting the evolution of AI understanding through neural connections [4][5] - He notes the historical development of AI models, from simple models in the 1980s to the complex architectures of today, such as Transformers [5][7] Group 3: Knowledge Transfer and Efficiency - The efficiency of knowledge transfer between humans is limited, with a maximum of 100 bits per second, while AI can share knowledge at a vastly superior rate, potentially in the billions of bits [12][13] - Hinton introduces the concept of knowledge distillation, where larger neural networks can transfer knowledge to smaller networks, akin to a teacher-student relationship [11][12] Group 4: Global Cooperation on AI Safety - Hinton calls for the establishment of an international community focused on AI safety, where countries can collaborate on training AI to be beneficial rather than harmful [15][17] - He suggests that despite differing national interests, there is a shared goal among countries to prevent AI from dominating humanity, which could lead to cooperative efforts similar to those during the Cold War [15][17]
两位大模型从业者群友如何评价小米MiMo大模型?
理想TOP2· 2025-04-30 13:04
Core Viewpoint - The article discusses the performance of various AI models, particularly focusing on their capabilities in mathematics and coding, highlighting the strengths and weaknesses of models like Qwen, MiMo, and MindGPT. Group 1: Model Performance - Qwen-7B outperforms MiMo in elementary mathematics tasks, which is unusual given that Qwen is a lower-tier model compared to MiMo [2] - The performance of models in the AIME (American high school mathematics competition) shows a significant disparity, with MiMo scoring high while struggling in other areas [2][5] - The results indicate that the pre-training of models like MiMo is heavily focused on mathematics and coding, potentially at the expense of other capabilities [1] Group 2: Model Comparison - MindGPT is noted to have a much larger parameter size compared to MiMo, making direct comparisons challenging [3] - The strategy of using smaller parameter models for specific metrics is seen as a way to showcase capabilities, although it may not reflect overall performance [3] - There is speculation that MiMo may have utilized distillation techniques for training, which could explain its performance discrepancies [4] Group 3: Community Insights - Discussions within the community suggest that the strategies employed by various teams, including the use of distillation, are common across the industry [7] - The community expresses a desire for genuine performance and capabilities rather than just marketing hype [3]
速递|Pruna AI开源模型压缩"工具箱",已完成种子轮融资650万美元
Z Potentials· 2025-03-21 03:22
Core Viewpoint - Pruna AI is focused on developing an AI model optimization framework that will be open-sourced, aiming to enhance the efficiency of various AI models through compression techniques [2][3]. Group 1: Company Overview - Pruna AI recently completed a seed funding round of $6.5 million, with investments from EQT Ventures, Daphni, Motier Ventures, and Kima Ventures [2]. - The company is building a framework that applies multiple efficiency methods to AI models, including caching and distillation, while standardizing the saving and loading of compressed models [2][3]. Group 2: Technology and Features - The framework can evaluate whether there is significant quality loss after model compression and the performance improvements achieved [3]. - Pruna AI's approach is compared to Hugging Face's standardization of transformers, focusing on efficiency methods rather than just single-method solutions [3]. - The company supports various model types, including large language models, diffusion models, speech-to-text models, and computer vision models, with a current emphasis on image and video generation models [4]. Group 3: Market Position and User Base - Existing users of Pruna AI include Scenario and PhotoRoom, indicating a growing interest in its optimization capabilities [4]. - The company plans to release a compression proxy feature that allows developers to specify desired speed and accuracy parameters, automating the optimization process [5]. Group 4: Business Model - Pruna AI charges for its professional version on an hourly basis, similar to GPU rental services in cloud computing [5]. - The optimization framework has demonstrated significant cost-saving potential, as evidenced by an eightfold reduction in the size of the Llama model with minimal loss [5].
速递丨全球AI巨头正加急抄DeepSeek作业,蒸馏降本或彻底颠覆美国技术先发优势
Z Finance· 2025-03-03 01:41
Core Viewpoint - The article discusses the rising significance of "distillation" technology in the AI sector, particularly how companies like OpenAI, Microsoft, and Meta are leveraging it to reduce costs and enhance accessibility to advanced AI capabilities, while also highlighting the competitive threat posed by startups like DeepSeek [1][2]. Group 1: Distillation Technology - Distillation technology allows a large language model (the "teacher model") to generate predictive data, which is then used to train a smaller, more efficient "student model," enabling rapid knowledge transfer [2]. - This technology has recently gained traction, with industry experts believing it will serve as a "cost-reduction and efficiency-enhancement" tool for AI startups, allowing them to build efficient AI applications without relying on extensive computational resources [2][5]. - The operational costs of training and maintaining large models like GPT-4 and Google's Gemini are estimated to be in the hundreds of millions of dollars, making distillation a valuable method for developers and businesses to access core capabilities at a lower cost [2][3]. Group 2: Industry Impact and Competition - Microsoft has implemented this strategy by distilling GPT-4 into a smaller language model, Phi, to facilitate commercialization [3]. - OpenAI is concerned that DeepSeek may be extracting information from its models to train competitive products, which could violate service terms, although DeepSeek has not responded to these allegations [3][7]. - The rise of distillation technology poses challenges to the business models of AI giants, as lower computational costs lead to reduced revenue from distilled models, prompting companies like OpenAI to charge lower fees for their use [6]. Group 3: Performance Trade-offs - While distillation significantly reduces operational costs, it may also lead to a decrease in the model's generalization ability, meaning distilled models might excel in specific tasks but perform poorly in others [5]. - Experts suggest that for many businesses, distilled models are sufficient for everyday applications like customer service chatbots, which can run efficiently on smaller devices [5][6]. Group 4: Open Source and Competitive Landscape - The widespread application of distillation is seen as a victory for open-source AI, allowing developers to innovate freely using open-source systems [7]. - However, the competitive landscape is becoming more complex, as companies can quickly catch up using distillation technology, raising questions about the sustainability of first-mover advantages in the rapidly evolving AI market [8].