Reinforcement Learning(RL)
Search documents
在看完近50篇VLA+RL工作之后......
具身智能之心· 2025-12-13 16:02
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models and their integration with reinforcement learning (RL) techniques, highlighting various research papers and projects that contribute to this field [2][4][5]. Group 1: Offline RL-VLA - NORA-1.5 is introduced as a vision-language-action model trained using world model- and action-based preference rewards, showcasing its potential in offline reinforcement learning [2][4]. - The paper "Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models" emphasizes the importance of balancing signal and variance in offline RL applications [7]. - CO-RFT presents an efficient fine-tuning method for VLA models through chunked offline reinforcement learning, indicating a trend towards optimizing model performance post-training [9]. Group 2: Online RL-VLA - The concept of reinforcing action policies by prophesying is explored, suggesting a novel approach to enhance online reinforcement learning for VLA models [22]. - WMPO focuses on world model-based policy optimization for VLA models, indicating a shift towards utilizing world models for better policy learning [24]. - RobustVLA emphasizes robustness-aware reinforcement post-training, highlighting the need for models to maintain performance under varying conditions [27]. Group 3: Hybrid Approaches - GR-RL aims to improve dexterity and precision in long-horizon robotic manipulation by combining offline and online reinforcement learning strategies [100]. - The paper "Discover, Learn, and Reinforce" discusses scaling VLA pretraining with diverse RL-generated trajectories, indicating a comprehensive approach to model training [104]. - SRPO introduces self-referential policy optimization for VLA models, showcasing innovative methods to enhance model adaptability and performance [106].
对谈 Pokee CEO 朱哲清:RL-native 的 Agent 系统应该长什么样?|Best Minds
海外独角兽· 2025-08-01 12:04
Core Insights - The rise of AI Agents marks a shift towards general intelligence capable of planning, execution, and self-optimization, moving beyond just larger models to multi-step decision-making and goal-oriented capabilities [3][4][8] - Pokee is pioneering a new approach by focusing on reinforcement learning (RL) as the core of its architecture, emphasizing goal evaluation, self-training, and memory retrieval, which significantly reduces inference costs and enhances generalization [3][4][8] Group 1: Training Paradigms and Capabilities - The multi-step agent training paradigm is transforming the landscape, with coding agents already demonstrating capabilities for multi-step reasoning and execution [8][9] - Other areas, particularly workflow automation, lag behind, with traditional tools like Zapier being less efficient compared to Pokee's offerings [9][11] - Creative workflows are emerging but face challenges in integrating generated content into existing design tools, indicating a bottleneck in the creative agent experience [11][12] Group 2: Reinforcement Learning and Exploration - RL is deemed essential for achieving true reasoning capabilities in agents, as pre-training alone does not suffice for complex decision-making [14][21] - The exploration process is critical for agents to understand goals and improve generalization, allowing them to navigate open-world environments effectively [38][39][43] - Current systems lack robust memory structures, which are vital for lifelong learning and personalization, highlighting a significant gap in existing technologies [45][47] Group 3: Memory and Personalization - Memory is crucial for agents to understand user preferences and historical interactions, enabling them to provide personalized responses and actions [45][48] - The challenge lies in managing non-linear memory structures and ensuring agents can adapt to changing user needs over time [49][50] - A focus on continuous learning systems is necessary to address the limitations of current models in retaining and updating knowledge [48][50] Group 4: Market Position and Future Directions - Pokee's strategy involves not just enhancing agent capabilities but also establishing a unique market position by integrating deeply with user workflows and data [51][52] - The company aims to provide both consumer-facing products and backend services for other agents, indicating a dual revenue model [54] - Future applications of agents are expected to flourish in sales, RPA, and coding, with potential in creative applications as well [58][59]