Workflow
视觉语言动作(VLA)模型
icon
Search documents
告别专家依赖,让机器人学会自我参考,仅需200步性能飙升至99.2%
具身智能之心· 2025-12-11 02:01
Core Insights - The article discusses the development of the Self-Referential Policy Optimization (SRPO) framework, which addresses the limitations of existing Visual Language Action (VLA) models in robotic tasks by enabling robots to learn from their own experiences without relying on external expert data [3][10][56]. Motivation and Contribution - SRPO aims to overcome the challenges of sparse reward signals in reinforcement learning, particularly in the VLA domain, by utilizing self-generated successful trajectories to provide progressive rewards for failed attempts [6][10]. - The framework eliminates the need for costly expert demonstrations and task-specific reward engineering, thus enhancing the efficiency of the learning process [10][12]. Technical Approach - SRPO collects trajectories generated during policy inference and categorizes them into successful and failed attempts, using a potential world representation to model behavior similarity [16][17]. - The framework employs a progressive reward mechanism based on the distance of failed trajectories to successful trajectory representations, allowing for a more nuanced evaluation of task progress [22][24]. Experimental Results - SRPO achieved a success rate of 99.2% in the LIBERO benchmark with only 200 steps of reinforcement learning, significantly outperforming traditional methods that rely on sparse rewards [29][30]. - In the LIBERO-Plus generalization tests, SRPO demonstrated a performance improvement of 167%, showcasing its robust generalization capabilities without the need for additional training data [31][32]. Efficiency and Real-World Application - The efficiency of SRPO is highlighted by its ability to improve success rates from 17.3% to 98.6% in long-term tasks with minimal training steps, outperforming other models in terms of training efficiency [36][39]. - The framework has been tested in real-world scenarios, showing significant improvements in success rates compared to supervised fine-tuning baselines [41][39]. Conclusion - SRPO represents a significant advancement in robotic learning, allowing for autonomous exploration and creativity by enabling robots to learn from their own successes and failures, thus paving the way for a new approach in VLA reinforcement learning [56].
告别专家依赖,让机器人学会自我参考,仅需200步性能飙升至99.2%
机器之心· 2025-12-10 05:10
Core Insights - The article discusses the development of the Self-Referential Policy Optimization (SRPO) framework, which enhances the performance of Visual Language Action (VLA) models in robotic tasks by addressing the challenges of sparse rewards and dependency on expert demonstrations [3][11]. Motivation and Contribution - Recent research indicates that reinforcement learning (RL) can significantly improve VLA models' performance both within and outside their training distribution. However, the challenge of sparse reward signals remains, particularly in VLA tasks where high computational costs and inefficient use of failure trajectory information hinder training efficiency [6][11]. - The SRPO framework alleviates the dependency on expert demonstrations and task-specific reward engineering by utilizing self-generated successful trajectories to provide progressive rewards for failed attempts [11][12]. Technical Approach - SRPO employs a "learn from success" paradigm, where trajectories generated during policy inference are collected and categorized into successful and failed attempts. The framework uses a potential world representation to model behavior similarity and calculate progressive rewards [14][16]. - The framework formalizes the robotic decision-making process as a partially observable Markov decision process (POMDP), introducing a world model-driven reward modeling mechanism that provides progressive reward signals for failed trajectories [18][19]. Experimental Results - SRPO achieved a success rate of 99.2% with only 200 steps of reinforcement learning, significantly outperforming baseline models that rely on sparse rewards or require manual reward design [27]. - In the LIBERO-Plus generalization tests, SRPO demonstrated a performance improvement of 167%, even without training on any generalized scenario data [30]. Efficiency and Real-World Application - The efficiency of SRPO is highlighted by its ability to improve success rates from 17.3% to 98.6% in long-term tasks with minimal training steps, showcasing its superior information utilization compared to traditional methods [34]. - The reward modeling of SRPO has been tested in real-world environments, showing significant success rate improvements for various tasks [37]. Conclusion - SRPO represents a significant advancement in VLA reinforcement learning, enabling robots to transition from imitation to autonomous exploration without the need for expensive data labeling or complex reward designs [51].
从300多篇工作来看, VLA是否为通向通用具身智能的必经之路?
具身智能之心· 2025-10-17 16:02
Core Insights - The emergence of Vision Language Action (VLA) models signifies a shift from traditional strategy-based control to a paradigm of general robotic technology, transforming visual language models (VLM) from passive sequence generators to active agents capable of manipulation and decision-making in complex, dynamic environments [2] Group 1: VLA Overview - The article discusses a comprehensive survey on advanced VLA methods, providing a clear taxonomy and systematic review of existing research [2] - VLA methods are categorized into several main paradigms: autoregressive, diffusion-based, reinforcement-based, hybrid methods, and specialized methods, with detailed examination of their motivations, core strategies, and implementations [2] - The survey integrates insights from over 300 recent studies, outlining the opportunities and challenges that will shape the development of scalable, general VLA methods [2] Group 2: Future Directions and Challenges - The review addresses key challenges and future development directions to advance VLA models and generalizable robotic technologies [2] - The live discussion will explore the origins of VLA, its research subdivisions, and the hot topics and future trends in VLA [5] Group 3: Event Details - The live event is scheduled for October 18, from 19:30 to 20:30, focusing on VLA as a prominent research direction in artificial intelligence [5] - Key highlights of the event include the classification of VLA research fields, the integration of VLA with reinforcement learning, and the Sim2Real concept [6]
谷歌拍了拍Figure说,“起来卷”
Hu Xiu· 2025-06-28 06:50
Core Insights - Google's Gemini Robotics On-Device model showcases the ability of robots to adapt quickly to new tasks and environments without continuous internet connectivity, marking a significant advancement in offline AI robotics [3][5][16] - The model is designed to enhance the efficiency and speed of robots in performing tasks through a "visual-language-action" framework, allowing for robust performance even in intermittent connectivity scenarios [3][5][19] Group 1: Model Features and Performance - The Gemini Robotics On-Device model was launched on June 24 and is the first of its kind to operate independently of data networks, which is beneficial for latency-sensitive applications [3][5] - It addresses three main challenges: dexterous manipulation, fine-tuning for new tasks, and low-latency reasoning based on local operation [5][12] - In demonstrations, the model successfully completed tasks such as placing blocks and opening drawers using natural language commands, indicating strong visual, semantic, and behavioral generalization capabilities [8][10] Group 2: Comparison with Other Technologies - The Gemini Robotics On-Device model, while slightly lower in performance than the flagship Gemini Robotics model, significantly outperforms previous best offline models [8][10] - It offers developers the option to fine-tune the model with as few as 50 to 100 demonstrations, enhancing its adaptability to new tasks [12][14] - The model has been tested on various robotic platforms, including the dual-arm Franka and Apptronik's Apollo humanoid robot, demonstrating its versatility in handling previously unseen objects and tasks [14][17] Group 3: Industry Context and Implications - The advancements in Gemini Robotics highlight the competitive landscape in the robotics and embodied intelligence sector, where various companies are exploring diverse technological approaches to enable AI to understand and interact with the physical world [19] - The ongoing developments suggest a potential shift in the robotics industry, with Google's offline AI robots being seen as game-changers by some observers [16][19] - The discourse around the technology raises questions about its differentiation from competitors like Tesla and Meta, indicating a vibrant and competitive environment in AI robotics [18][19]
3个月斩获两轮数亿融资,头部具身智能机器人创企迎技术、商业化双重突破!
Robot猎场备忘录· 2025-04-21 02:38
温馨提示 : 点击下方图片,查看运营团队2025年最新原创报告(共210页) 说明: 欢迎约稿、刊例合作、行业人士交流 , 行业交流记得先加入 "机器人头条"知识星球 ,后添加( 微信号:lietou100w ) 微信; 若有侵权、改稿请联系编辑运营(微信:li_sir_2020); 正文: 2025年2月20日,国外知名人形机器人独角兽公司【Figure AI】 推出自研通用型视觉语言动作(VLA)模型— Helix ,并开创性采用 双系统架构( 负责"慢思考",处理高层语义和目标规划 S2和负责"快反应",实时执行和调 整动作 S1 ),开启双系统架构VLA模型先河,专为高频率、灵巧控制整个人形机器人上半身而设计。 2025年2月26日, 作为国外最早提出视觉语言动作(VLA)模型,拥有全球具身智能领域"最强创始团队的具身智能 大模型初创公司[Physical Intelligence](简称 PI或 π )基于其公司端到端大模型π0( pi-zero) 推出"分层交互 式机器人"系统(全称:Hierarchical Interactive Robot ,简称Hi Robot) ,它允许整合VLA模型,例 ...