生成式强化学习出价技术
Search documents
除了研发可灵,快手如何把大模型应用在核心业务上?
Xin Lang Cai Jing· 2025-11-11 06:35
Core Insights - Kuaishou has developed multiple large models this year, including OneRec for recommendation systems, OneSearch for e-commerce search, and a generative reinforcement learning bidding technology for commercialization [1][10] - The company aims to enhance user experience and improve merchant efficiency through the application of these large models in core business scenarios [1][10] Recommendation System - Kuaishou's self-developed OneRec model innovates in multi-modal representation alignment, addressing the inadequacies of open-source models in extracting relationships from private recommendation data [2][4] - The OneRec model has undergone three iterations, significantly improving user engagement metrics: OneRec-V1 increased average user stay time by 0.5% and 1.17% for Kuaishou App and Kuaishou Lite respectively, while reducing the proportion of marketing accounts in recommended content [4][5] - Subsequent versions, OneRec-V2 and OneRec-Think, further enhanced user engagement, with OneRec-V2 increasing stay time by an additional 0.46% and 0.74% [4][5] E-commerce Search - Kuaishou's OneSearch model replaces traditional e-commerce search architectures, addressing issues like semantic confusion and incomplete understanding of user intent [5][9] - The implementation of OneSearch has led to a 2.3% increase in click-through rates on search pages, a reduction in decision-making time to one-third of traditional methods, and over a 40% increase in exposure for quality long-tail products [9] Commercialization - Kuaishou has introduced generative reinforcement learning bidding technology, which analyzes a series of bids and feedback to optimize decision-making based on ROI and customer acquisition costs [9][10] - The company emphasizes the importance of integrating AI technology with business scenarios to drive core business benefits [10][11]
生成式强化学习在广告自动出价场景的技术实践
AI前线· 2025-09-28 05:48
Core Insights - The article discusses the evolution and challenges of bidding algorithms in real-time bidding (RTB) advertising systems, emphasizing the transition from traditional methods to advanced techniques like generative reinforcement learning [2][3][7]. Group 1: Evolution of Bidding Algorithms - The bidding algorithm has evolved through three generations: PID, MPC, and reinforcement learning (RL), each improving upon the previous in terms of adaptability and effectiveness in complex bidding environments [5][6][7]. - The introduction of generative reinforcement learning aims to enhance decision-making by utilizing historical bidding sequences for more accurate predictions [8][10]. Group 2: Challenges in Bidding - Key challenges faced by bidding algorithms include the need to manage daily budgets while minimizing conversion costs, the unpredictability of traffic and competitor behavior, and the complexity of sequential decision-making [5][6]. - The reliance on high-quality datasets poses a challenge, as simple exploration can lead to out-of-distribution (OOD) issues, necessitating efficient offline exploration mechanisms [12][14]. Group 3: GAVE Algorithm - The GAVE algorithm integrates score-based return-to-go (RTG) and value function-based action exploration to enhance model learning and address the challenges of data quality and exploration [18][19]. - Experimental results show that GAVE outperforms baseline algorithms in various budget settings, demonstrating its effectiveness in maximizing conversion value [22][25]. Group 4: CBD Algorithm - The CBD algorithm introduces Completer and Aligner modules to improve the alignment of generated sequences with optimization goals, addressing issues of sequence legality and preference alignment [29][31]. - Offline experiments indicate that CBD significantly outperforms other methods in total conversion value, validating its effectiveness in real-world applications [34][36]. Group 5: Future Directions - Future advancements in bidding technology are expected to focus on developing foundational models that leverage multi-scenario data and enhancing interpretability and decision-making capabilities through the integration of large language models [41].