Workflow
Group Sequence Policy Optimization (GSPO) 算法
icon
Search documents
阿里Qwen提出强化学习新算法GSPO
news flash· 2025-07-27 15:20
Core Insights - The article discusses the introduction of the Group Sequence Policy Optimization (GSPO) algorithm by Tongyi Qwen to enhance Reinforcement Learning (RL) capabilities [1] Group 1 - GSPO defines importance ratios at the sequence level, differentiating it from previous RL algorithms [1] - The algorithm executes clipping, rewards, and optimization at the sequence level [1]