Workflow
机器之心
icon
Search documents
DeepSeek的GRPO会导致模型崩溃?看下Qwen3新范式GSPO
机器之心· 2025-08-07 09:42
Core Viewpoint - The article discusses the evolution of reinforcement learning techniques in the post-training phase of large language models (LLMs), highlighting the introduction of Group Sequence Policy Optimization (GSPO) as a solution to the instability issues associated with Group Relative Policy Optimization (GRPO) [2][10][31]. Group 1: Training Phases and Techniques - The training of large language models typically consists of two phases: pre-training and post-training, where the latter focuses on improving the model's understanding and execution of human instructions [1]. - The post-training phase employs reinforcement learning, with initial methods like Reinforcement Learning from Human Feedback (RLHF) being time-consuming and costly due to reliance on human annotators [2][3]. Group 2: Innovations and Comparisons - DeepSeek introduced an automated approach to RLHF, significantly reducing costs and improving efficiency by allowing the model to learn through reward signals rather than manual evaluations [2]. - The DeepSeek team proposed the Group Relative Policy Optimization (GRPO) algorithm, which they believe is more effective than the Proximal Policy Optimization (PPO) used by OpenAI in ChatGPT [3][5]. Group 3: Issues with GRPO - The Qwen team identified serious stability issues with GRPO, particularly due to its reliance on token-level importance sampling, which can lead to high variance and training instability [10][11][12]. - The instability arises from the incorrect application of importance sampling weights at the token level, which can accumulate high variance in long sequences, exacerbating the training challenges [15][16][17]. Group 4: Introduction of GSPO - To address the issues with GRPO, the Qwen team proposed the Group Sequence Policy Optimization (GSPO), which utilizes sequence-level importance sampling to enhance training stability [10][22][31]. - GSPO's design mitigates the accumulation of variance seen in token-level sampling, leading to improved training efficiency and stability [23][24]. Group 5: Experimental Evidence and Advantages - Experimental results demonstrated that GSPO outperformed GRPO in various tasks, showcasing better scalability and efficiency in training [20][30]. - The Qwen team highlighted that GSPO simplifies the training of Mixture-of-Experts (MoE) models by eliminating the need for auxiliary strategies like Routing Replay, which were necessary for GRPO to achieve stable convergence [25][27][30].
三重激励+全周期扶持,即梦升级这个计划,让AI创作者的成长有迹可循
机器之心· 2025-08-07 09:42
Core Viewpoint - The article discusses the comprehensive upgrade of the "AI Creator Growth Program" by Jimeng AI, emphasizing the need for a supportive ecosystem for creators in the AI content production landscape, which has seen a significant transformation due to AI technology [9][10]. Summary by Sections AI Content Creation Revolution - The past year has witnessed a revolution in content creation driven by AI technology, breaking down traditional barriers and allowing individual creators to produce high-quality content with minimal resources [9]. - The efficiency of creation has been redefined, leading to fundamental changes in content forms, styles, and cost structures [9]. Challenges for Creators - Despite the advancements, creators face challenges such as high competition, lack of sustainable growth paths, limited monetization opportunities, and insufficient support within the creative ecosystem [9][10]. Jimeng AI Creator Growth Program - Launched in February, the program aims to provide tangible support to creators through incentives, collaboration opportunities, and traffic distribution, having already supported 3,802 creators and distributed over 28 million points [11]. - The program has been upgraded to include a three-tier support system for potential stars, advanced creators, and super creators, offering various resources like point rewards, platform traffic, and project access [11][12]. Tiered Support Mechanism - For potential stars, creators can earn points by publishing content, with rewards for popular ideas and meeting content standards [13]. - Advanced creators can access additional benefits, including cash rewards for top-performing content and various resources for growth [14][15]. - Super creators receive the most comprehensive support, including significant point rewards, priority access to projects, and funding for their own initiatives [16][17]. Community Building and Ecosystem - Jimeng AI aims to build a sustainable and growth-oriented creative ecosystem, integrating various AI capabilities for a seamless creation experience [20]. - The platform encourages a diverse and decentralized community of creators, fostering collaboration and quality content production [23]. - Regular activities like online workshops and creative challenges are organized to stimulate community engagement and provide visibility for creators [24].
硬核拆解大模型,从 DeepSeek-V3 到 Kimi K2 ,一文看懂 LLM 主流架构
机器之心· 2025-08-07 09:42
Core Viewpoint - The article discusses the evolution of large language models (LLMs) over the past seven years, highlighting that while model capabilities have improved, the overall architecture has remained consistent. It questions whether there have been any disruptive innovations or if advancements have been incremental within the existing framework [2][5]. Group 1: Architectural Innovations - The article details eight mainstream LLMs, including DeepSeek and Kimi, analyzing their architectural designs and innovative approaches [5]. - DeepSeek V3, released in December 2024, introduced key architectural technologies that enhanced computational efficiency, distinguishing it among other LLMs [10][9]. - The multi-head latent attention mechanism (MLA) is introduced as a memory-saving strategy that compresses key and value tensors into a lower-dimensional latent space, significantly reducing memory usage during inference [18][22]. Group 2: Mixture-of-Experts (MoE) - The MoE layer in the DeepSeek architecture allows for multiple parallel feedforward submodules, significantly increasing the model's parameter capacity while reducing computational costs during inference through sparse activation [23][30]. - DeepSeek V3 features 256 experts in each MoE module, with a total parameter count of 671 billion, but only activates 9 experts per token during inference [30]. Group 3: OLMo 2 and Its Design Choices - OLMo 2 is noted for its high transparency in training data and architecture, which serves as a reference for LLM development [32][34]. - The architecture of OLMo 2 includes a unique normalization strategy, utilizing RMSNorm and QK-norm to enhance training stability [38][46]. Group 4: Gemma 3 and Sliding Window Attention - Gemma 3 employs a sliding window attention mechanism to reduce memory requirements for key-value (KV) caching, representing a shift towards local attention mechanisms [53][60]. - The architecture of Gemma 3 also features a dual normalization strategy, combining Pre-Norm and Post-Norm approaches [62][68]. Group 5: Mistral Small 3.1 and Performance - Mistral Small 3.1, released in March 2023, outperforms Gemma 3 in several benchmarks, attributed to its custom tokenizer and reduced KV cache size [73][75]. - Mistral Small 3.1 adopts a standard architecture without the sliding window attention mechanism used in Gemma 3 [76]. Group 6: Llama 4 and MoE Adoption - Llama 4 incorporates MoE architecture, similar to DeepSeek V3, but with notable differences in the activation of experts and overall design [80][84]. - The MoE architecture has seen significant development and adoption in 2025, indicating a trend towards more complex and capable models [85]. Group 7: Kimi K2 and Its Innovations - Kimi K2, with a parameter count of 1 trillion, is recognized as one of the largest LLMs, utilizing the Muon optimizer variant for improved training performance [112][115]. - The architecture of Kimi K2 is based on DeepSeek V3 but expands upon its design, showcasing the ongoing evolution of LLM architectures [115].
人大高瓴-华为诺亚:大语言模型智能体记忆机制的系列研究
机器之心· 2025-08-07 02:41
本系列工作第一作者张泽宇,中国人民大学博士生,研究方向为大语言模型智能体的记忆机制和个性 化;谭浩然,中国人民大学硕士生,研究方向为大语言模型智能体。陈旭,中国人民大学预聘副教授, 研究方向包括大语言模型,信息检索等。 近期,基于大语言模型的智能体(LLM-based agent)在学术界和工业界中引起了广泛关注。对于智 能体而言,记忆(Memory)是其中的重要能力,承担了记录过往信息和外部知识的功能,对于提高智 能体的个性化等能力至关重要。中国人民大学高瓴人工智能学院与华为诺亚方舟实验室聚焦大语言模型 智能体的记忆能力,在该领域的研究早期,形成了一套完整的包括综述论文、数据集和工具包的研究体 系,致力于推动该领域的发展。 智能体记忆机制的早期综述 (TOIS'25) 在 2024 年 4 月,团队完成了早期的关于智能体记忆机制的综述。该综述从不同角度对智能体的记忆 进行了全面讨论。该综述讨论了「什么是智能体的记忆」和「为什么智能体需要记忆」,总结回顾了 「如何实现智能体的记忆」和「如何评测智能体的记忆能力」,归纳整理了「记忆增强的智能体应 用」,并提出当前工作存在的局限性和未来方向。通过该综述,团队希望能 ...
您猜怎么着?Grok 4进决赛,大模型对抗赛Gemini全军覆没,马斯克「装」起来了
机器之心· 2025-08-07 02:41
Core Viewpoint - The AI chess competition organized by Google has seen Grok 4 defeat Gemini 2.5 Pro to reach the finals, showcasing the evolving capabilities of AI models in strategic games like chess [2][6][46]. Group 1: Competition Overview - The Kaggle AI Chess competition featured models like Grok 4, Gemini 2.5 Pro, o3, and o4-mini, with Grok 4 defeating Gemini 2.5 Pro in a surprising semi-final match [2][6]. - In the semi-finals, Grok 4 and o3 both won their matches against Gemini 2.5 Pro and o4-mini, respectively, with Grok's victory being particularly hard-fought, ending in a tiebreaker after a 2:2 draw in regular play [6][24]. - The final match is set to be between Grok 4 and OpenAI's o3, with the competition generating significant interest in AI's strategic capabilities [7][46]. Group 2: Performance Analysis - Grok 4's performance against Gemini 2.5 Pro was marked by a chaotic display, with Grok initially losing a piece but ultimately winning in a tiebreaker after a series of mistakes from both sides [25][38]. - o3 demonstrated exceptional stability and reasoning ability, achieving a perfect accuracy score in one of its matches, while o4-mini's lightweight design led to its predictable defeat [10][15]. - The competition aims to analyze how AI models think and strategize, with specific games providing insights into their decision-making processes [12][46]. Group 3: Expert Commentary - Chess Grandmaster Peter Heine Nielsen commented on Grok's strategic understanding, noting its positional awareness but also highlighting its lack of tactical precision in critical moments [40]. - The matches have illustrated the ongoing challenges AI faces in maintaining performance under pressure, particularly when deviating from established opening theories [26][36].
Token成本下降,订阅费却飞涨,AI公司怎么了?
机器之心· 2025-08-06 04:31
Core Viewpoint - The article discusses the challenges faced by AI companies in balancing subscription pricing and operational costs, highlighting a potential "prisoner's dilemma" where companies struggle between offering unlimited subscriptions and usage-based pricing, leading to unsustainable business models [3][45][46]. Group 1 - DeepSeek's emergence in the AI space was marked by its impressive training cost of over $5 million, which contributed to its popularity [1]. - The training costs for AI models have decreased significantly, with Deep Cogito reportedly achieving a competitive model for under $3.5 million [2]. - Despite the decreasing training costs, operational costs, particularly for inference, are rising sharply, creating a dilemma for AI companies [3][15]. Group 2 - Companies are adopting low-cost subscription models, such as $20 per month, to attract users, banking on future cost reductions in model training [7][12]. - The expectation that model costs will decrease by tenfold does not alleviate the pressure on subscription services, as operational costs continue to rise [5][13]. - The reality is that even with cheaper models, profit margins are declining, as evidenced by the experiences of companies like Windsurf and Claude Code [14][15]. Group 3 - Users are increasingly demanding the latest and most powerful models, leading to a rapid shift in demand towards new releases, regardless of previous models' cost reductions [17][21]. - The pricing history of leading models shows that while initial costs may drop, the demand for the latest technology keeps prices stable [20][22]. - The consumption of tokens has increased dramatically, with the number of tokens used per task doubling every six months, leading to unexpected cost increases [28][29]. Group 4 - Companies like Anthropic have attempted to address cost pressures by implementing strategies such as increasing subscription prices and optimizing model usage based on load [38][40]. - Despite these efforts, the consumption of tokens continues to rise exponentially, making it difficult to maintain sustainable pricing models [41][44]. - The article suggests that a fixed subscription model is no longer viable in the current landscape, as companies face a fundamental shift in pricing dynamics [44][60]. Group 5 - The article outlines three potential strategies for AI companies to navigate the cost pressures: adopting usage-based pricing from the start, targeting high-margin enterprise clients, and vertically integrating to capture value across the tech stack [51][52][57]. - Companies that continue to rely on fixed-rate subscription models are likely to face significant challenges and potential failure [60][62]. - The expectation that future model costs will decrease significantly may not align with the increasing user expectations for performance and capabilities [61][64].
ICCV 2025 | SeaS: 工业异常生成+正常合成+精准掩码大一统框架,指标全面碾压SOTA
机器之心· 2025-08-06 04:31
Core Viewpoint - The article discusses the SeaS model, a unified few-shot industrial anomaly generation method that addresses the challenges of generating diverse anomaly samples and precise mask annotations in industrial quality inspection, significantly improving the performance of downstream anomaly detection tasks [3][45]. Group 1: Model Overview - SeaS utilizes a unified framework that requires only 1-3 training samples to simultaneously achieve diverse anomaly generation, consistent normal product synthesis, and pixel-level precise mask annotation, setting a new benchmark in the field [9][45]. - The model leverages a separation and sharing fine-tuning mechanism to model the different change patterns of normal products and anomalies, enhancing the precision of the generation process while maintaining the diversity of anomalies and consistency of normal products [10][45]. Group 2: Technical Innovations - SeaS introduces three major innovations: a unified few-shot generation framework, a separation and sharing fine-tuning mechanism, and a refined mask prediction branch that integrates U-Net discriminative features with high-resolution VAE features for pixel-level accurate anomaly labeling [8][10][45]. - The model employs an unbalanced anomaly text prompt structure to effectively represent the inherent differences between normal and abnormal products, ensuring precise control over the changes in anomaly regions [15][45]. Group 3: Performance Metrics - SeaS outperforms existing few-shot industrial anomaly generation methods across key metrics on mainstream industrial datasets such as MVTec AD and VisA, with an average improvement of 12.79% in anomaly segmentation IoU [7][32][41]. - The generated data from SeaS significantly enhances the performance of supervised segmentation models, with notable improvements in metrics such as AUROC and pixel-level accuracy across various datasets [38][41][43]. Group 4: Practical Applications - The generated anomaly samples from SeaS can be effectively applied to synthetic data-based detection methods, leading to significant improvements in detection performance and a reduction in false negatives across multiple datasets [37][45]. - The model's ability to generate high-quality normal images also aids in augmenting training datasets for unsupervised detection methods, resulting in reduced false positives and optimized performance metrics [37][41].
闹玩呢!首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
机器之心· 2025-08-06 04:31
Core Viewpoint - The article discusses the results of the first large model chess competition organized by Google, highlighting the performance of various AI models, particularly Grok 4, which emerged as a strong contender with a perfect record [2][30]. Group 1: Competition Overview - The chess competition lasted three days and featured models such as Gemini 2.5 Pro, o4-mini, Grok 4, and o3, all achieving a 4-0 victory in the first round [4]. - The competition was held on the Kaggle Game Arena platform, aiming to evaluate the performance of large language models (LLMs) in dynamic and competitive environments [6]. Group 2: Match Results - Kimi k2 lost to o3 with a score of 0-4, failing to make legal moves in all four games [7][8]. - o4-mini defeated DeepSeek R1 with a score of 4-0, showcasing a decline in game quality after a few strong opening moves [18][21]. - Gemini 2.5 Pro won against Claude 4 Opus with a score of 4-0, although its true strength remains uncertain due to Claude's mistakes [23][24]. - Grok 4 achieved a perfect score of 4-0 against Gemini 2.5 Flash, demonstrating superior chess skills and the ability to capitalize on unprotected pieces [30][33]. Group 3: Key Observations - The competition revealed three main weaknesses in current AI models: insufficient global board visualization, limited understanding of piece interactions, and issues executing legal moves [36]. - Grok 4's performance suggests it may have overcome these limitations, raising questions about the stability of these advantages in future matches [36]. Group 4: Audience Engagement - A poll conducted prior to the competition indicated that 37% of participants favored Gemini 2.5 Pro as the likely winner, with Grok 4 receiving 7.04% of the votes [37][38].
就是阻击OpenAI,Claude抢先数十分钟发布Claude Opus 4.1
机器之心· 2025-08-06 01:49
Core Viewpoint - The article discusses the competitive landscape in AI model development, highlighting the release of Anthropic's Claude Opus 4.1 shortly before OpenAI's anticipated announcement, suggesting a strategic move by Anthropic to capture market attention [1][2]. Summary by Sections Model Release and Features - Anthropic has launched Claude Opus 4.1, which is built on the previous Claude Opus 4 model released in May. The new model shows significant improvements in agent tasks, real-world programming, and reasoning capabilities, featuring a context window of approximately 200K [7]. - Claude Opus 4.1 is available for various user tiers, including Claude Pro, Max, Team, and Enterprise [8]. Pricing and Cost Efficiency - The API pricing for Claude Opus 4.1 is set at $15 per million input tokens and $75 per million output tokens. Users can save up to 90% on costs with prompt caching and up to 50% with batch processing [10][11]. Performance Improvements - According to GitHub evaluations, Claude Opus 4.1 has outperformed its predecessor in most capabilities, particularly in multi-file code refactoring. Users from Rakuten Group noted its precision in handling large codebases without introducing new bugs [14]. - The performance leap of Claude Opus 4.1 is compared to the upgrade from Sonnet 3.7 to Sonnet 4, indicating substantial advancements [15]. Benchmark Comparisons - In various benchmarks, Claude Opus 4.1 shows superior performance compared to other models, achieving 74.5% in agentic coding SWE-bench and 80.9% in graduate-level reasoning GPQA Diamond [16]. Use Cases - Claude Opus 4.1 supports mixed reasoning modes for instant responses and detailed reasoning processes. It is particularly effective in advanced programming tasks and intelligent search and research applications, capable of conducting extensive autonomous research across diverse data sources [17][18]. Additional Information - Anthropic has also released a system card alongside the new model, providing further insights into its functionalities [19].
Discrete Tokenization:多模态大模型的关键基石,首个系统化综述发布
机器之心· 2025-08-05 18:56
Core Insights - The article discusses the advancements in Discrete Tokenization for Multimodal Large Language Models (LLMs), emphasizing its role in transforming various modalities into discrete representations that LLMs can process effectively [2][39]. - A comprehensive survey has been released, detailing the technical landscape, challenges, and future research directions in the field of Discrete Tokenization for Multimodal LLMs [2][39]. Multimodal LLMs and Discrete Tokenization - Recent breakthroughs in Large Language Models (LLMs) have led to their application in various text tasks, prompting interest in extending their capabilities to non-text modalities such as images, audio, and video [2]. - Discrete Tokenization has emerged as a key solution, utilizing techniques like Vector Quantization (VQ) to compress high-dimensional continuous inputs into compact discrete tokens, enhancing cross-modal understanding and generation [2][39]. Systematic Review and Methodologies - The article presents the first systematic review of Discrete Tokenization for Multimodal LLMs, organizing content based on input data modalities and combinations, from early single-modal to multi-modal tokenization methods [2][39]. - Eight core categories of Vector Quantization methods are identified, including VQ, RVQ, PQ, AQ, FSQ, LFQ, BSQ, and Graph Anchor-Relation Tokenization, each with unique characteristics suitable for different modalities and tasks [8][9][14]. Challenges and Future Directions - Key challenges in Discrete Tokenization include codebook collapse, information loss during quantization, difficulties in gradient propagation, and issues with granularity and semantic alignment [12][36]. - Future research directions may focus on adaptive quantization, unified frameworks, biologically inspired codebooks, cross-modal generalization, and enhancing interpretability [37][36]. Applications in Single and Multimodal Tasks - Discrete Tokenization has been widely applied in single-modal tasks such as image retrieval, audio encoding, and video representation, allowing LLMs to process non-text modalities effectively [20][22]. - In multimodal tasks, it serves as a semantic bridge, enabling models to handle complex inputs across different modalities, facilitating tasks like cross-modal retrieval and generation [27][30].