生成式强化学习
Search documents
ICLR 2026 Oral | 告别多步去噪!清华团队推出MVP,实现机器人动作单步极速生成
机器之心· 2026-03-16 10:23
Core Insights - The article discusses the breakthrough research on Mean Velocity Policy (MVP), which enhances the efficiency and quality of generative reinforcement learning by enabling one-step action generation while maintaining high expressiveness and speed [4][9][26]. Background - Generative reinforcement learning faces efficiency and quality bottlenecks, particularly in real-time control scenarios where optimal actions often exhibit multimodal distributions. Traditional methods struggle with high inference delays due to iterative denoising processes [5][6]. Key Contributions - MVP combines the high expressiveness of generative strategies with the time efficiency of one-step action generation, addressing the limitations of traditional methods [9][26]. Technical Innovations - Instantaneous Velocity Constraint (IVC) is introduced to anchor the mean flow policy, providing a unique boundary condition that enhances the precision and stability of the policy fitting process [12][14]. - The Generate-and-Select mechanism allows for efficient generation and selection of candidate actions, ensuring continuous improvement of the policy during the reinforcement learning process [16][18]. Experimental Results - MVP achieved state-of-the-art (SOTA) performance across various tasks in the Robomimic and OGBench benchmarks, demonstrating superior online convergence speed and final performance, particularly in complex tasks [20][21]. - The computational efficiency of MVP is significantly higher, with online training throughput improved by over 50% compared to traditional methods that require multiple steps for denoising [27]. Summary and Outlook - The research addresses the slow sampling speed and high inference delay in generative reinforcement learning, proposing the MVP framework that allows for instantaneous action generation without the need for distillation. This advancement indicates a new paradigm for embodied intelligent systems aiming for extreme responsiveness [26].
生成式强化学习在广告自动出价场景的技术实践
AI前线· 2025-09-28 05:48
Core Insights - The article discusses the evolution and challenges of bidding algorithms in real-time bidding (RTB) advertising systems, emphasizing the transition from traditional methods to advanced techniques like generative reinforcement learning [2][3][7]. Group 1: Evolution of Bidding Algorithms - The bidding algorithm has evolved through three generations: PID, MPC, and reinforcement learning (RL), each improving upon the previous in terms of adaptability and effectiveness in complex bidding environments [5][6][7]. - The introduction of generative reinforcement learning aims to enhance decision-making by utilizing historical bidding sequences for more accurate predictions [8][10]. Group 2: Challenges in Bidding - Key challenges faced by bidding algorithms include the need to manage daily budgets while minimizing conversion costs, the unpredictability of traffic and competitor behavior, and the complexity of sequential decision-making [5][6]. - The reliance on high-quality datasets poses a challenge, as simple exploration can lead to out-of-distribution (OOD) issues, necessitating efficient offline exploration mechanisms [12][14]. Group 3: GAVE Algorithm - The GAVE algorithm integrates score-based return-to-go (RTG) and value function-based action exploration to enhance model learning and address the challenges of data quality and exploration [18][19]. - Experimental results show that GAVE outperforms baseline algorithms in various budget settings, demonstrating its effectiveness in maximizing conversion value [22][25]. Group 4: CBD Algorithm - The CBD algorithm introduces Completer and Aligner modules to improve the alignment of generated sequences with optimization goals, addressing issues of sequence legality and preference alignment [29][31]. - Offline experiments indicate that CBD significantly outperforms other methods in total conversion value, validating its effectiveness in real-world applications [34][36]. Group 5: Future Directions - Future advancements in bidding technology are expected to focus on developing foundational models that leverage multi-scenario data and enhancing interpretability and decision-making capabilities through the integration of large language models [41].
快手解密「AI印钞机」,首提生成式强化学习出价技术,为平台实现超过3%的广告收入提升
机器之心· 2025-09-23 04:08
Group 1 - Alphabet, Google's parent company, recently surpassed a market capitalization of $3 trillion, becoming the fourth company to reach this milestone [1] - Despite initial concerns about its advertising revenue due to the rise of ChatGPT, Google managed to stabilize its ad revenue and enhance user intent understanding through generative AI integration [1] - In China, Kuaishou reported a 12.8% year-on-year increase in online marketing service revenue, reaching 19.8 billion yuan in Q2, driven by advancements in generative AI for ad bidding and recommendations [2] Group 2 - Kuaishou's new bidding algorithm, termed "Generative Reinforcement Learning," allows for multi-dimensional thinking in bid modeling, leading to over a 3% increase in ad revenue while maintaining cost targets [3][4] - The evolution of Kuaishou's bidding technology has progressed through several generations, culminating in the current fourth generation of "Generative Reinforcement Learning" [12] Group 3 - The GAVE algorithm, introduced by Kuaishou, addresses challenges in aligning bidding strategies with overall optimization goals, enhancing the effectiveness of ad bidding [22][24] - GAVE has shown significant improvements in performance metrics compared to previous models, achieving optimal results across various budget settings [31] Group 4 - The CBD algorithm, another innovation from Kuaishou, aims to resolve issues related to state sequence consistency and preference alignment in bidding strategies [35][37] - CBD has demonstrated superior performance in offline experiments, significantly outperforming baseline algorithms in total conversion value [41] Group 5 - Kuaishou's commercial algorithm team has achieved notable recognition in the industry, winning multiple awards and competitions, which translates into substantial business growth [44][47] - The advancements in generative reinforcement learning bidding technology are expected to continue evolving, with Kuaishou outlining future directions for further development [50]