π0
Search documents
RLinf上新πRL:在线强化学习微调π0和π0.5
机器之心· 2025-11-06 08:58
Core Insights - The article discusses the advancements in the field of robotics, particularly focusing on the VLA models π0 and π0.5 developed by Physical Intelligence, which utilize flow matching techniques to generate high-dimensional and smooth continuous action sequences, demonstrating significant advantages in complex manipulation tasks [2][3]. Group 1: VLA Models and Challenges - VLA models heavily rely on large-scale, high-quality human demonstration data, which is costly and time-consuming to collect and annotate [2]. - Reinforcement learning (RL) allows agents to explore and iteratively improve through real interactions with the environment, reducing the dependency on extensive data and enhancing the performance ceiling of supervised fine-tuning (SFT) [2]. Group 2: πRL Framework - A collaborative effort from institutions like Tsinghua University, Peking University, and CMU has led to the development of the πRL framework for online reinforcement learning fine-tuning of flow matching VLA models [3]. - The πRL framework achieved an average success rate of 97.6% for π0 and 98.3% for π0.5 on the LIBERO testing platform, validating the effectiveness of the fine-tuning approach [3]. Group 3: Technical Innovations - πRL introduces two technical routes: Flow-Noise and Flow-SDE, addressing the challenge of directly calculating the log-likelihood of output actions in flow matching VLA [8][10]. - Flow-Noise models the denoising process as a discrete Markov process, enabling the direct computation of the joint probability density of the denoised sequence [10]. - Flow-SDE combines the denoising process with environmental interaction, constructing a two-layer Markov Decision Process (MDP) [20]. Group 4: Performance Improvements - The πRL framework demonstrated a success rate increase of over 40% across 4,352 grasp-and-place task combinations, achieving final success rates exceeding 80% [3][24]. - In the LIBERO testing platform, πRL improved the average success rate of π0 from 57.6% to 97.6% and π0.5 from 77.1% to 98.3%, surpassing the performance of fully data-trained flow matching VLAs [19]. Group 5: Generalization and Robustness - The πRL algorithm significantly enhances the generalization capabilities of both models in new environments, as evidenced by tests involving domain randomization [26]. - The framework's ability to reduce the average number of steps required to complete tasks indicates improved efficiency compared to supervised fine-tuning [28]. Group 6: Future Directions - Future developments of πRL will include more benchmark tests, deeper analysis of out-of-distribution (OOD) generalization capabilities, and further exploration of critic design for improved stability [35][36].
K-ScaleLabs产品与工程负责人离职!创立GradientRobotics新公司聚焦美国机器人与物理AI关键难题!
机器人大讲堂· 2025-10-26 10:03
Core Insights - Jingxiang Mo has left K-Scale Labs to establish a new company, Gradient Robotics, focusing on critical issues in robotics and physical AI in the U.S. [1][6] - During his tenure at K-Scale Labs, he led the development of K-Bot and Z-Bot, achieving significant milestones in under eight months with a small team [2][10]. Group 1: K-Scale Labs Achievements - K-Scale Labs successfully created K-Bot, an open-source humanoid robot platform, which was the first of its kind in the U.S. for consumer use, featuring a reinforcement learning-based motion system [10][11]. - The K-Bot project saw rapid success, with 150 units sold out quickly, generating over $2 million in sales, and attracting clients like Google DeepMind and OpenAI [11][13]. - The Z-Bot project aims for mass production with a price point below $1,000, and has already garnered significant interest, with over 20,000 people on the waiting list [13]. Group 2: Industry Context and Future Directions - The robotics and physical AI sectors are becoming competitive focal points, with companies like Physical Intelligence (PI) and FieldAI emerging as key players, both valued over $2 billion [18][29]. - PI focuses on developing general AI models for robots, while FieldAI aims to create AI models that can operate in real-world environments, showcasing the industry's demand for technological breakthroughs [24][27]. - Gradient Robotics is positioned to leverage its experience in open-source humanoid robotics to address the challenges in general robotics technology, drawing parallels to advancements in autonomous driving [4][30].
VLA+强化学习,会催生更强大的系统!
具身智能之心· 2025-07-31 00:04
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing tasks through visual language models and diverse datasets [5][10][11]. Group 1: RT-2 and Its Capabilities - RT-2 is introduced as a foundational robot model that can process visual questions and execute tasks based on language instructions, showcasing the potential of remote-accessible robotic models [5][7]. - The model's ability to convert robot control tasks into question-answer formats allows it to perform various basic language instructions effectively [7][8]. Group 2: RT-X Dataset and Its Impact - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, providing a diverse training ground for robotic models [10]. - Models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of cross-embodiment models [11]. Group 3: Evolution of VLA Models - The first-generation VLA model, RT-2, is noted for its simplicity, while the second-generation models utilize continuous action distributions for improved performance in complex tasks [14][15]. - The second-generation VLA models incorporate specialized mechanisms for generating continuous actions, enhancing their control capabilities [17][18]. Group 4: π0 and π0.5 Models - The π0 model, based on a large language model with 3 billion parameters, is designed to handle various tasks, including folding clothes, demonstrating its adaptability in different environments [18][23]. - The latest π0.5 model is aimed at executing long-term tasks in new environments, integrating high-level reasoning capabilities to manage complex instructions [28][30]. Group 5: Future Directions and Reinforcement Learning - Future VLA models are expected to integrate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [34][39]. - The combination of VLA and DLA (Deep Learning Architecture) is proposed to create a more effective system, leveraging expert data to improve generalist capabilities [44][46].
PI联合创始人,机器人大神!详解VLA+强化学习,催生更强大的系统
具身智能之心· 2025-07-30 06:03
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing complex tasks through improved data sets and model architectures [6][12][44]. Group 1: RT-2 and RT-X Models - RT-2 is introduced as a foundational robot model that utilizes a visual language model to process image-based commands and execute tasks [8][10]. - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, showcasing a diverse range of robotic capabilities [13][26]. - Cross-embodiment models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of generalization in robotic learning [13][29]. Group 2: Evolution of VLA Models - The first generation of VLA models, like RT-2, is based on simple question-answer structures for robot control, while the second generation incorporates continuous action distributions for better performance [16][19]. - The second generation VLA models, such as π0, utilize a large language model with an action expert module to handle complex tasks, generating action sequences over time [22][24]. - The π0.5 model is designed for long-term tasks, integrating high-level reasoning to execute complex instructions in new environments [36][40]. Group 3: Integration of Reinforcement Learning - Future VLA models are expected to incorporate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [44][49]. - The integration of reinforcement learning with VLA aims to create a more effective training process, allowing robots to learn from both expert data and real-world interactions [56][60]. - Current research is focused on developing stable and effective end-to-end training processes that leverage reinforcement learning to improve VLA capabilities [60].
Physical Intelligence 创始人:人形机器人被高估了
海外独角兽· 2025-03-28 11:51
Core Insights - The article emphasizes the importance of Physical Intelligence (PI) in the robotics field, positioning it as a leading entity akin to OpenAI in AI research, focusing on developing a foundation model for general-purpose robots [3][4]. - Chelsea Finn, the core founder of PI, highlights the necessity of diverse robot data for achieving generalization in robotics, stressing that the quantity and variety of real-world data are crucial for training effective models [3][10]. Group 1: Chelsea Finn's Entry into Robotics - Chelsea Finn was initially attracted to robotics due to its potential impact and the intriguing mathematical challenges it presents, leading her to pursue research in this field over a decade ago [6][7]. - The focus of her early research was on training neural networks to control robotic arms, which has since gained recognition and progress in the robotics domain [6][7]. Group 2: PI's Research Progress and Development - PI aims to create a large neural network model capable of controlling any robot in various scenarios, differing from traditional robotics that often focuses on specific applications [10][12]. - The company emphasizes the importance of utilizing diverse data from various robot platforms to maximize the value of the data collected [10][12]. Group 3: Achieving AGI in Robotics - PI is focused on long-term challenges in robotics rather than specific applications, recognizing the need for new methods that allow for human-robot collaboration and error tolerance [21][22]. - The company believes that physical intelligence is central to achieving AGI in robotics, with a vision of a diverse ecosystem of robot forms emerging in the future [22][37]. Group 4: Hi Robot - The recently launched Hi Robot by PI aims to enhance task execution efficiency by incorporating reasoning and planning into robotic actions, allowing for more interactive human-robot communication [25][26]. - This system enables robots to respond to user prompts and adjust actions in real-time, showcasing a significant advancement in robotic capabilities [26][28]. Group 5: Sensory Requirements for Robots - Current robotic sensors primarily rely on visual data, with ongoing challenges in integrating tactile sensors due to durability and cost issues [29][30]. - The focus is on improving data processing and architecture rather than adding new sensors, with a priority on developing memory capabilities in robots [30]. Group 6: Comparison with Autonomous Driving - The development timelines for robotics and autonomous driving differ, with robotics facing higher dimensional challenges and requiring greater precision [31][33]. - The article notes that while large companies have capital advantages, startups can act more swiftly to collect diverse data and iterate on robotic technologies [34]. Group 7: Perspectives on Training Data and Hardware - The value of human observation data for training robots is acknowledged, but it is emphasized that robots need to learn from their own physical experiences to achieve significant progress [35][36]. - The future of robotics is expected to feature a variety of hardware platforms optimized for specific tasks, leading to a "Cambrian explosion" of robotic forms [36][37].