VLA+RL - filings, earnings calls, financial reports, news - Reportify

VLA+RL

Search documents

当我们把VLA+RL任务展开后......

具身智能之心· 2026-01-06 10:00

如果说今年哪个方向最受欢迎，一定是VLA+RL。 VLA模型为具身智能带来了新的交互范式：机器人不再依赖精确定义的状态和规则，而是通过视觉感知环境、理解语言指令，并直接生成动作序列。这一能力极大地降低了任务描述和系统设计的门槛，使机器人能够应对更加开放和复杂的场景。然而，在真实机器人系统中，VLA 往往仍然面临执行不稳定、对初始状态敏感、长时序任务易失败等问题，其核心原因在于模型缺乏基于环境反馈的持续修正能力。强化学习的出现为VLA带来了新的解决思路。RL并不是一门新的学科，但RL的优势为VLA提供了从"理解"走向"执行优化"的关键机制。通过引入奖励或价值信号，RL可以在保持VLA感知与语言能力的同时，对动作策略进行闭环优化，弥补模仿学习在分布外状态和误差累积上的不足。当前的研究趋势也逐渐从"单纯训练 VLA 模型"转向"以 VLA 作为策略表示，结合RL进行微调和强化"，包括离线 RL 提升样本效率、层级 RL 约束长时序行为，以及基于视觉和语言的自监督反馈建模等方向。方法上，目前VLA+RL主要分为在线RL、离线RL、test-time三种方案。 paper多，想入坑的人也多了起来.. ...

面向VLA+RL方向的科研辅导小课

面向VLA+RL方向的科研辅导小课

今年的VLA+RL的工作正在排队等着录用......

具身智能之心· 2025-12-24 00:25

点击下方卡片，关注" 具身智能之心 "公众号最近在盘VLA+RL的工作，不管是基于世界模型的在线方案，还是offline，VLA好像始终离不开RL。仅依赖模仿学习的 VLA 在真实世界 OOD 场景中仍然脆弱，缺乏失败恢复、自主探索与闭环纠错能力。强化学习（RL）的优势在于能够显著提升VLA模型的泛化能力，一些工作的实验显示分布外任务上的性能提升可达 42.6%。有效果，就有很多工作继续跟进，今年产出了好多篇paper～近期的几个工作，包括wholebodyvla、pi0.6、GR-RL都取得了惊艳的效果，pi0.6推出的时候很多同学说大概率就是+强化。世界模型加持的在线系统也是比较活跃的方向，期望有更多突破。工具上，VLA+RL框架也在逐渐完善，这里也推荐下于超老师那边的Rlinf，支持的方法越来越多。链接：https://github.com/RLinf/RLinf 由于相关工作众多，这里给大家分享一些这两年比较有代表性的VLA+RL工作，这些paper陆续被不同的会议收录。 ❝ 我们也建议后续的研究可以往此方向靠拢，如果不知道怎么展开研究也欢迎咨询具身智能之心的科研助理，一键启动 ...

强化学习（RL）

视觉语言动作模型（VLA）

强化学习（RL）

视觉语言动作模型（VLA）

今年大概率产了n篇VLA+RL工作吧？！

自动驾驶之心· 2025-12-23 03:43

最近在盘VLA+RL的工作，不管是基于世界模型的在线方案，还是offline，VLA好像始终离不开RL。仅依赖模仿学习的 VLA 在真实世界 OOD 场景中仍然脆弱，缺乏失败恢复、自主探索与闭环纠错能力。强化学习（RL）的优势在于能够显著提升VLA模型的泛化能力，一些工作的实验显示分布外任务上的性能提升可达 42.6%。有效果，就有很多工作继续跟进，今年产出了好多篇paper～点击下方卡片，关注" 具身智能之心 "公众号近期的几个工作，包括wholebodyvla、pi0.6、GR-RL都取得了惊艳的效果，pi0.6推出的时候很多同学说大概率就是+强化。世界模型加持的在线系统也是比较活跃的方向，期望有更多突破。工具上，VLA+RL框架也在逐渐完善，这里也推荐下于超老师那边的Rlinf，支持的方法越来越多。链接：https://github.com/RLinf/RLinf 由于相关工作众多，这里给大家分享一些这两年比较有代表性的VLA+RL工作，这些paper陆续被不同的会议收录。 ❝ 我们也建议后续的研究可以往此方向靠拢，如果不知道怎么展开研究也欢迎咨询具身智能之心的科研助理，一键启动 ...

强化学习（RL）

强化学习（RL）

VLA+RL技术交流群来啦～

具身智能之心· 2025-12-23 03:34

添加小助理微信AIDriver005，备注：昵称+机构+进群。具身智能之心VLA技术交流群来啦～欢迎VLA模型、VLA+RL、轻量化与部署方向的同学加入！ ...

轻量化与部署

轻量化与部署

今年大概率产了n篇VLA+RL工作吧？！

具身智能之心· 2025-12-22 10:23

Core Insights - The article emphasizes the integration of Reinforcement Learning (RL) with Vision-Language-Action (VLA) models to enhance their generalization capabilities, particularly in out-of-distribution (OOD) scenarios, where performance improvements can reach up to 42.6% [2]. Group 1: Research Directions - The article suggests that future research should focus on the combination of VLA and RL, encouraging collaboration with research assistants for guidance on starting projects in these areas [3]. - Several notable recent works in VLA+RL have been highlighted, showcasing significant advancements in the field [5][10]. Group 2: Notable Papers and Projects - A list of representative papers from the last two years is provided, including titles such as "NORA-1.5" and "Balancing Signal and Variance," which focus on various aspects of VLA and RL integration [5][10]. - Links to project homepages and paper PDFs are shared for further exploration of these works [6][9][12]. Group 3: Tools and Frameworks - The article mentions the development of tools like Rlinf, which supports a growing number of methods for VLA+RL frameworks, indicating a trend towards more robust and versatile research tools [2][11].

强化学习（RL）

视觉语言动作模型（VLA）

强化学习（RL）

视觉语言动作模型（VLA）

在看完近50篇VLA+RL工作之后......

具身智能之心· 2025-12-13 16:02

Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models and their integration with reinforcement learning (RL) techniques, highlighting various research papers and projects that contribute to this field [2][4][5]. Group 1: Offline RL-VLA - NORA-1.5 is introduced as a vision-language-action model trained using world model- and action-based preference rewards, showcasing its potential in offline reinforcement learning [2][4]. - The paper "Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models" emphasizes the importance of balancing signal and variance in offline RL applications [7]. - CO-RFT presents an efficient fine-tuning method for VLA models through chunked offline reinforcement learning, indicating a trend towards optimizing model performance post-training [9]. Group 2: Online RL-VLA - The concept of reinforcing action policies by prophesying is explored, suggesting a novel approach to enhance online reinforcement learning for VLA models [22]. - WMPO focuses on world model-based policy optimization for VLA models, indicating a shift towards utilizing world models for better policy learning [24]. - RobustVLA emphasizes robustness-aware reinforcement post-training, highlighting the need for models to maintain performance under varying conditions [27]. Group 3: Hybrid Approaches - GR-RL aims to improve dexterity and precision in long-horizon robotic manipulation by combining offline and online reinforcement learning strategies [100]. - The paper "Discover, Learn, and Reinforce" discusses scaling VLA pretraining with diverse RL-generated trajectories, indicating a comprehensive approach to model training [104]. - SRPO introduces self-referential policy optimization for VLA models, showcasing innovative methods to enhance model adaptability and performance [106].

Vision-Language-Action（VLA）

Reinforcement Learning（RL）

Vision-Language-Action（VLA）

Reinforcement Learning（RL）

具身智能之心招募VLA+RL方向的合作伙伴~

具身智能之心· 2025-11-17 10:01

Group 1 - The article discusses the recruitment of a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) [1][2] - The ideal candidate should have a PhD or higher in the academic field, or practical experience in the industry, particularly with real machine debugging [2] - The community, known as "Embodied Intelligence Heart," is the first full-stack technology exchange platform in China, gathering many individuals interested in VLA and RL [3] Group 2 - The company offers compensation above the industry average along with abundant industry resources for the recruited lecturer [4] - For more detailed information, interested individuals are encouraged to add a specified WeChat contact for consultation [5]

VLA+RL方向在线课程

VLA+RL方向在线课程

招募VLA+RL方向的合伙人！

具身智能之心· 2025-11-11 03:48

Core Viewpoint - The company is seeking to recruit a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) to enhance understanding in these areas [1]. Group 1 - The company aims to develop an online course in the VLA and RL domain, responding to community interest [1]. - The ideal candidate for the lecturer position should have a PhD or be a doctoral student in the VLA and RL research area, with experience in top conferences [2]. - The company is recognized as the first full-stack technology communication community in China, focusing on embodied intelligence, and has gathered many individuals interested in VLA and RL [3]. Group 2 - The company offers compensation above the industry average and provides access to extensive industry resources for the lecturer position [4]. - For more detailed information, interested individuals are encouraged to contact via WeChat [5].

VLA+RL方向在线课程

VLA+RL方向在线课程

VLA+RL正在不断拉升着具身操作的上限!

具身智能之心· 2025-11-11 00:02

Core Insights - The article discusses the integration of Reinforcement Learning (RL) with Visual Language Models (VLA), highlighting how RL enhances the capabilities of VLA by bridging the gap between pre-training and real-world tasks [1][4]. Group 1: Technical Developments - RL training models directly optimize the "complete task" goal, allowing models to handle unexpected situations not present in training data, thus improving robustness [1]. - The reward mechanism enables VLA to learn smoother trajectories and align more closely with the physical world [1]. - A recommended open-source repository for VLA+RL methods is provided, facilitating entry-level research [2]. Group 2: Evaluation Results - Evaluation results on various LIBERO task groups show significant performance metrics for different models, with the π0.5 model achieving an average accuracy of 96.9% across tasks [5]. - The Flow-SDE π0 model demonstrated a 38.5% improvement in average accuracy when combined with RL [5]. Group 3: Community and Resources - The community offers continuous live sharing sessions, including roundtable forums and discussions on various topics within the embodied intelligence industry [7]. - A comprehensive technical roadmap is available for beginners, outlining essential technologies and learning paths [9]. - The community has established job referral mechanisms with several companies in the embodied intelligence sector, providing valuable networking opportunities [13]. Group 4: Educational Materials - The community has compiled over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various technical learning routes [15]. - Specific learning routes for different aspects of embodied intelligence, such as reinforcement learning and multi-modal large models, are detailed to assist learners at various levels [16][42]. Group 5: Industry Insights - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for academic and industrial exchange [14]. - Regular updates on academic progress and industrial applications in embodied intelligence are shared, keeping members informed about the latest developments [21][23].

招募VLA+RL方向的合伙人！

具身智能之心· 2025-10-31 04:00

Core Insights - The article discusses the recruitment of a lecturer for an online course focused on VLA (Vision-Language Alignment) and RL (Reinforcement Learning) [1][2] - The community aims to enhance understanding and knowledge sharing in the field of embodied intelligence, specifically in VLA and RL [3] Recruitment Requirements - Candidates should have a research background in VLA and RL, preferably holding a PhD or being a doctoral student, with publications in top conferences [2] - Practical experience in the industry, including hands-on debugging with real machines, is also desired [2] Community Overview - The company, "Embodied Intelligence Heart," is identified as the first comprehensive technical exchange community in China, focusing on VLA and RL [3] - The community has attracted a significant number of individuals interested in these research areas [3] Compensation and Resources - The company offers compensation that is above the industry average, along with access to extensive industry resources [4]

VLA+RL方向在线课程

VLA+RL方向在线课程