Workflow
MoE)
icon
Search documents
美团今日正式发布并开源LongCat-Flash-Chat
Mei Ri Jing Ji Xin Wen· 2025-09-01 02:53
Core Insights - Meituan has officially released and open-sourced LongCat-Flash-Chat, which utilizes an innovative Mixture-of-Experts (MoE) architecture with a total of 560 billion parameters [2] - The model activates between 18.6 billion to 31.3 billion parameters, averaging 27 billion, achieving a dual optimization of computational efficiency and performance [2] - LongCat-Flash-Chat demonstrates performance comparable to leading mainstream models while activating only a small number of parameters, particularly excelling in agent tasks [2]
DeepSeek的GRPO会导致模型崩溃?看下Qwen3新范式GSPO
机器之心· 2025-08-07 09:42
Core Viewpoint - The article discusses the evolution of reinforcement learning techniques in the post-training phase of large language models (LLMs), highlighting the introduction of Group Sequence Policy Optimization (GSPO) as a solution to the instability issues associated with Group Relative Policy Optimization (GRPO) [2][10][31]. Group 1: Training Phases and Techniques - The training of large language models typically consists of two phases: pre-training and post-training, where the latter focuses on improving the model's understanding and execution of human instructions [1]. - The post-training phase employs reinforcement learning, with initial methods like Reinforcement Learning from Human Feedback (RLHF) being time-consuming and costly due to reliance on human annotators [2][3]. Group 2: Innovations and Comparisons - DeepSeek introduced an automated approach to RLHF, significantly reducing costs and improving efficiency by allowing the model to learn through reward signals rather than manual evaluations [2]. - The DeepSeek team proposed the Group Relative Policy Optimization (GRPO) algorithm, which they believe is more effective than the Proximal Policy Optimization (PPO) used by OpenAI in ChatGPT [3][5]. Group 3: Issues with GRPO - The Qwen team identified serious stability issues with GRPO, particularly due to its reliance on token-level importance sampling, which can lead to high variance and training instability [10][11][12]. - The instability arises from the incorrect application of importance sampling weights at the token level, which can accumulate high variance in long sequences, exacerbating the training challenges [15][16][17]. Group 4: Introduction of GSPO - To address the issues with GRPO, the Qwen team proposed the Group Sequence Policy Optimization (GSPO), which utilizes sequence-level importance sampling to enhance training stability [10][22][31]. - GSPO's design mitigates the accumulation of variance seen in token-level sampling, leading to improved training efficiency and stability [23][24]. Group 5: Experimental Evidence and Advantages - Experimental results demonstrated that GSPO outperformed GRPO in various tasks, showcasing better scalability and efficiency in training [20][30]. - The Qwen team highlighted that GSPO simplifies the training of Mixture-of-Experts (MoE) models by eliminating the need for auxiliary strategies like Routing Replay, which were necessary for GRPO to achieve stable convergence [25][27][30].
六年来首次开源,OpenAI放出两款o4-mini级的推理模型
Jin Shi Shu Ju· 2025-08-06 03:47
Core Insights - OpenAI has launched two open-source AI inference models, GPT-oss-120b and GPT-oss-20b, which are comparable in capability to its existing models [1][2] - The release marks OpenAI's return to the open-source language model space after six years, aiming to attract both developers and policymakers [2][3] Model Performance - In the Codeforces programming competition, GPT-oss-120b and GPT-oss-20b scored 2622 and 2516, respectively, outperforming DeepSeek's R1 model but slightly below OpenAI's own o3 and o4-mini models [2] - In the Human-Level Exam (HLE), the models achieved scores of 19% and 17.3%, surpassing DeepSeek and Qwen but still lower than o3 [3] - The "hallucination" rates for the GPT-oss models were significantly higher than those of OpenAI's latest models, with rates of 49% and 53% compared to 16% and 36% for o1 and o4-mini [3] Model Training Methodology - The GPT-oss models utilize a "Mixture-of-Experts" architecture, activating only a portion of their parameters for efficiency [5] - Despite having 117 billion parameters, GPT-oss-120b activates only 510 million per token, and both models underwent high-computational reinforcement learning [5] - Currently, the models only support text input and output, lacking multi-modal processing capabilities [5] Licensing and Data Transparency - GPT-oss-120b and GPT-oss-20b are released under the Apache 2.0 license, allowing commercial use without authorization [5] - OpenAI has chosen not to disclose the training data sources, a decision influenced by ongoing copyright litigation in the AI sector [6] Competitive Landscape - OpenAI faces increasing competition from Chinese AI labs like DeepSeek and Alibaba's Tongyi (Qwen), which have released leading open-source models [2] - The focus in the industry is shifting towards upcoming models from DeepSeek and Meta's Superintelligence Lab, indicating a rapidly evolving competitive environment [6]