Workflow
Kimi1.5
icon
Search documents
SimKO:缓解RLVR训练中的概率过度集中,优化pass@K性能
机器之心· 2025-11-08 04:02
Core Insights - The article discusses the limitations of existing Reinforcement Learning with Verified Rewards (RLVR) methods in enhancing the performance of large language models, particularly in terms of pass@K metrics, which show a decline compared to base models despite improvements in pass@1 performance [2][3][12]. Group 1: Problem Analysis - The decline in exploration capability of RLVR methods is attributed to the models concentrating probabilities on a single reasoning path, thus sacrificing the ability to explore diverse correct solutions [3][12]. - Current RLVR algorithms, such as GRPO and DAPO, reinforce the probability of correct answers while punishing incorrect ones, leading to a concentration of probability on rank-1 candidates and inhibiting exploration of other potential correct paths [8][23]. - The use of entropy as a diversity metric is limited, as it does not accurately reflect the shape of the probability distribution, which can lead to misleading conclusions about the model's exploration capabilities [9][12]. Group 2: Proposed Solution - The research team introduces SimKO (Simple Pass@K Optimization), a new algorithm designed to improve pass@K performance by addressing the issue of probability concentration [4][17]. - SimKO employs an asymmetric gradient adjustment strategy, applying label smoothing to correct paths while imposing precise penalties on incorrect paths, thus balancing exploration and exploitation [17][23]. - The algorithm identifies key tokens with high entropy in reasoning paths, applying updates only to these critical nodes to enhance the model's exploration capabilities [18][20]. Group 3: Experimental Results - SimKO was evaluated on multiple mathematical reasoning benchmarks, demonstrating significant improvements in pass@K performance while maintaining or slightly enhancing pass@1 accuracy [21][27]. - In comparison to GRPO, SimKO showed a 31.6% increase in pass@1 and a 26.3% increase in pass@128 on in-distribution tasks, while also performing well on out-of-distribution tasks [27][26]. - The results indicate that SimKO effectively mitigates the issue of probability concentration, thereby enhancing the model's exploration ability and improving overall performance metrics [26][27].
我让10个大模型又参加了完整版数学高考,第一名居然是它。。。
数字生命卡兹克· 2025-06-09 21:20
Core Viewpoint - The article discusses the performance of various AI models in a simulated high school mathematics exam, highlighting unexpected results and the rapid evolution of AI capabilities in understanding and solving mathematical problems [1][21]. Group 1: Testing Methodology - The testing included previously missing models such as Zhiyu Z1, Kimi 1.5, and Wenxin X1, aiming to provide a comprehensive assessment of AI models' mathematical abilities [3][8]. - Specific scoring rules were established, focusing on correctness of results rather than step-by-step solutions, with each question being run through the models three times to determine accuracy [5][6]. - The inclusion of multimodal questions required models to interpret images, which proved challenging for many, with only OpenAI's model performing adequately [10][12]. Group 2: Results and Rankings - The results were surprising, with models like Xunfei Xinghuo and Doubao achieving high scores of 145 points, excelling in most questions except for a specific one [15][16]. - Qwen3 scored 143.3 points, performing well in answer questions but losing points in fill-in-the-blank sections [16]. - Gemini 2.5 Pro ranked fourth with a score of 139.7 points, while other models like Hunyuan T1 and Wenxin X1 tied for fifth place with slightly lower scores [17][18]. Group 3: Observations and Implications - The article notes the rapid advancement of AI, suggesting that within two years, AI models have reached a level comparable to that of excellent students in high school mathematics [21]. - The author expresses a sense of excitement and surprise at the results, indicating a positive outlook on the future capabilities of AI in educational contexts [22].
无人再谈AI六小龙
凤凰网财经· 2025-06-02 13:49
Core Viewpoint - The article discusses the decline of the so-called "AI Six Dragons" in the face of competition from major tech companies and the emergence of new players like DeepSeek, leading to a reclassification as the "AI Four Strong" [1][2][12]. Group 1: Player Dropout - The transition from "AI Six Dragons" to "AI Four Strong" reflects the reality of some players falling behind in the large model arena [2][8]. - The initial excitement around the "AI Six Dragons" was fueled by their strong technical teams and significant early funding, but many have since lost their competitive edge [2][4]. - Companies like Zero One and Baichuan Intelligence have shifted focus away from large models, indicating a broader trend of retreat from ambitious goals [1][5]. Group 2: Commercial Challenges - The decline of the "AI Six Dragons" is primarily attributed to commercialization difficulties, with many companies unable to sustain their operations in the face of high costs and competition from larger firms [9][11]. - Major tech companies like Alibaba and ByteDance have aggressively entered the AI space, overshadowing the initial advantages held by the "Six Dragons" [11][12]. - The lack of transparency regarding revenue and business performance among the "Six Dragons" contrasts sharply with the success of OpenAI, which has seen significant growth in paid users and revenue [10][11]. Group 3: Talent Exodus - There has been a notable exodus of key executives from the "AI Six Dragons," which has diminished their ability to innovate and compete effectively [19][20]. - The departure of top talent to larger firms or new ventures reflects a declining attractiveness of the "Six Dragons" as a career choice [18][19]. - The loss of core team members is expected to impact the speed of model iterations and the companies' leverage with investors [19][20]. Group 4: Future Outlook - The article suggests that the future for the remaining "Four Strong" is fraught with challenges, as they struggle to keep pace with advancements in AI technology and face a lack of funding [8][24]. - The shift in market focus towards embodied intelligence and other AI applications further complicates the landscape for these companies [23][24]. - Historical parallels are drawn to the "AI Four Dragons" of the past, indicating that without significant changes, the current "Four Strong" may face a similar fate [25].