Workflow
VLA+RL
icon
Search documents
如果说今年哪个方向最受欢迎,一定是VLA+RL
具身智能之心· 2026-01-19 00:49
纯模仿学习的 VLA,本质是在"复制数据分布"。一旦环境、物体、初始状态发生变化,就容易崩。很多 失败是连续动作误差累积导致的。RL提供的是闭环优化能力:用环境反馈修正动作,用value/reward信号 约束长时序行为。 当前的研究趋势也逐渐从"单纯训练 VLA 模型"转向"以 VLA 作为策略表示,结合RL进行微调和强化",包 括离线 RL 提升样本效率、层级 RL 约束长时序行为,以及基于视觉和语言的自监督反馈建模等方向。 方法上,目前VLA+RL主要分为在线RL、离线RL、test-time三种方案。 paper多,想入坑的人也多了起来...... 最近有同学后台留言,导师不熟悉这个领域,都是自己趟坑,从硬件到数据,再到训练,一直跑不出效 果,也没好的idea~ 如果说今年哪个方向最受欢迎,一定是VLA+RL。 VLA模型为具身智能带来了新的交互范式:机器人不再依赖精确定义的状态和规则,而是通过视觉感知环 境、理解语言指令,并直接生成动作序列。这一能力极大地降低了任务描述和系统设计的门槛,使机器人 能够应对更加开放和复杂的场景。 然而,在真实机器人系统中,VLA 往往仍然面临执行不稳定、对初始状态敏感 ...
VLA+RL技术交流群来啦~
具身智能之心· 2026-01-08 04:23
Group 1 - The article introduces a new technical exchange group focused on VLA technology, inviting participants interested in VLA models, VLA+RL, and lightweight deployment [1]
今年的VLA+RL的工作正在排队等着录用......
具身智能之心· 2025-12-24 00:25
Core Insights - The article emphasizes the importance of Reinforcement Learning (RL) in enhancing the generalization capabilities of Vision-Language-Action (VLA) models, with some experiments showing performance improvements of up to 42.6% on out-of-distribution tasks [2]. Group 1: VLA and RL Integration - VLA models are currently reliant on RL to overcome limitations in real-world out-of-distribution scenarios, where imitation learning alone proves insufficient [2]. - Recent advancements in VLA+RL frameworks have led to significant breakthroughs, with several notable papers published this year [2]. - Tools supporting VLA+RL frameworks are evolving, with recommendations for resources like Rlinf, which offers a growing number of supported methods [2]. Group 2: Notable Research Papers - A summary of representative VLA+RL research papers from the past two years is provided, highlighting their contributions to the field [5]. - Key papers include "NORA-1.5," which focuses on a VLA model trained using world model and action-based preference rewards, and "Balancing Signal and Variance," which discusses adaptive offline RL post-training for VLA flow models [5][10]. - Other significant works include "ReinboT," which enhances robot visual-language manipulation through RL, and "WMPO," which optimizes policies based on world models for VLA [8][10]. Group 3: Future Research Directions - The article suggests that future research should align with the advancements in VLA and RL, encouraging collaboration and consultation for those interested in exploring these areas [3].
今年大概率产了n篇VLA+RL工作吧?!
自动驾驶之心· 2025-12-23 03:43
Core Insights - The article emphasizes the importance of Reinforcement Learning (RL) in enhancing the generalization capabilities of Vision-Language-Action (VLA) models, with some experiments showing performance improvements of up to 42.6% on out-of-distribution tasks [2]. Group 1: VLA and RL Integration - VLA models are currently reliant on RL to overcome limitations in real-world out-of-distribution scenarios, where imitation learning alone proves insufficient [2]. - Recent advancements in VLA+RL frameworks have led to significant breakthroughs, with several notable papers published this year [2]. - Tools supporting VLA+RL frameworks, such as Rlinf, are becoming increasingly comprehensive, offering a variety of methods for researchers [2]. Group 2: Notable Research Papers - A summary of representative VLA+RL research papers from the past two years is provided, indicating a growing body of work in this area [5]. - Specific papers mentioned include "NORA-1.5," "Balancing Signal and Variance," and "CO-RFT," which focus on different aspects of VLA and RL integration [5][10]. - The article encourages further research in these areas and offers assistance for those looking to explore VLA, real2sim2real, and RL [3].
VLA+RL技术交流群来啦~
具身智能之心· 2025-12-23 03:34
Group 1 - The article introduces a new technical exchange group focused on VLA technology, inviting participants interested in VLA models, VLA+RL, and lightweight deployment [1]
今年大概率产了n篇VLA+RL工作吧?!
具身智能之心· 2025-12-22 10:23
Core Insights - The article emphasizes the integration of Reinforcement Learning (RL) with Vision-Language-Action (VLA) models to enhance their generalization capabilities, particularly in out-of-distribution (OOD) scenarios, where performance improvements can reach up to 42.6% [2]. Group 1: Research Directions - The article suggests that future research should focus on the combination of VLA and RL, encouraging collaboration with research assistants for guidance on starting projects in these areas [3]. - Several notable recent works in VLA+RL have been highlighted, showcasing significant advancements in the field [5][10]. Group 2: Notable Papers and Projects - A list of representative papers from the last two years is provided, including titles such as "NORA-1.5" and "Balancing Signal and Variance," which focus on various aspects of VLA and RL integration [5][10]. - Links to project homepages and paper PDFs are shared for further exploration of these works [6][9][12]. Group 3: Tools and Frameworks - The article mentions the development of tools like Rlinf, which supports a growing number of methods for VLA+RL frameworks, indicating a trend towards more robust and versatile research tools [2][11].
在看完近50篇VLA+RL工作之后......
具身智能之心· 2025-12-13 16:02
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models and their integration with reinforcement learning (RL) techniques, highlighting various research papers and projects that contribute to this field [2][4][5]. Group 1: Offline RL-VLA - NORA-1.5 is introduced as a vision-language-action model trained using world model- and action-based preference rewards, showcasing its potential in offline reinforcement learning [2][4]. - The paper "Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models" emphasizes the importance of balancing signal and variance in offline RL applications [7]. - CO-RFT presents an efficient fine-tuning method for VLA models through chunked offline reinforcement learning, indicating a trend towards optimizing model performance post-training [9]. Group 2: Online RL-VLA - The concept of reinforcing action policies by prophesying is explored, suggesting a novel approach to enhance online reinforcement learning for VLA models [22]. - WMPO focuses on world model-based policy optimization for VLA models, indicating a shift towards utilizing world models for better policy learning [24]. - RobustVLA emphasizes robustness-aware reinforcement post-training, highlighting the need for models to maintain performance under varying conditions [27]. Group 3: Hybrid Approaches - GR-RL aims to improve dexterity and precision in long-horizon robotic manipulation by combining offline and online reinforcement learning strategies [100]. - The paper "Discover, Learn, and Reinforce" discusses scaling VLA pretraining with diverse RL-generated trajectories, indicating a comprehensive approach to model training [104]. - SRPO introduces self-referential policy optimization for VLA models, showcasing innovative methods to enhance model adaptability and performance [106].
具身智能之心招募VLA+RL方向的合作伙伴~
具身智能之心· 2025-11-17 10:01
Group 1 - The article discusses the recruitment of a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) [1][2] - The ideal candidate should have a PhD or higher in the academic field, or practical experience in the industry, particularly with real machine debugging [2] - The community, known as "Embodied Intelligence Heart," is the first full-stack technology exchange platform in China, gathering many individuals interested in VLA and RL [3] Group 2 - The company offers compensation above the industry average along with abundant industry resources for the recruited lecturer [4] - For more detailed information, interested individuals are encouraged to add a specified WeChat contact for consultation [5]
招募VLA+RL方向的合伙人!
具身智能之心· 2025-11-11 03:48
Core Viewpoint - The company is seeking to recruit a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) to enhance understanding in these areas [1]. Group 1 - The company aims to develop an online course in the VLA and RL domain, responding to community interest [1]. - The ideal candidate for the lecturer position should have a PhD or be a doctoral student in the VLA and RL research area, with experience in top conferences [2]. - The company is recognized as the first full-stack technology communication community in China, focusing on embodied intelligence, and has gathered many individuals interested in VLA and RL [3]. Group 2 - The company offers compensation above the industry average and provides access to extensive industry resources for the lecturer position [4]. - For more detailed information, interested individuals are encouraged to contact via WeChat [5].
VLA+RL正在不断拉升着具身操作的上限!
具身智能之心· 2025-11-11 00:02
Core Insights - The article discusses the integration of Reinforcement Learning (RL) with Visual Language Models (VLA), highlighting how RL enhances the capabilities of VLA by bridging the gap between pre-training and real-world tasks [1][4]. Group 1: Technical Developments - RL training models directly optimize the "complete task" goal, allowing models to handle unexpected situations not present in training data, thus improving robustness [1]. - The reward mechanism enables VLA to learn smoother trajectories and align more closely with the physical world [1]. - A recommended open-source repository for VLA+RL methods is provided, facilitating entry-level research [2]. Group 2: Evaluation Results - Evaluation results on various LIBERO task groups show significant performance metrics for different models, with the π0.5 model achieving an average accuracy of 96.9% across tasks [5]. - The Flow-SDE π0 model demonstrated a 38.5% improvement in average accuracy when combined with RL [5]. Group 3: Community and Resources - The community offers continuous live sharing sessions, including roundtable forums and discussions on various topics within the embodied intelligence industry [7]. - A comprehensive technical roadmap is available for beginners, outlining essential technologies and learning paths [9]. - The community has established job referral mechanisms with several companies in the embodied intelligence sector, providing valuable networking opportunities [13]. Group 4: Educational Materials - The community has compiled over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various technical learning routes [15]. - Specific learning routes for different aspects of embodied intelligence, such as reinforcement learning and multi-modal large models, are detailed to assist learners at various levels [16][42]. Group 5: Industry Insights - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for academic and industrial exchange [14]. - Regular updates on academic progress and industrial applications in embodied intelligence are shared, keeping members informed about the latest developments [21][23].