π0.5
Search documents
RLinf上新πRL:在线强化学习微调π0和π0.5
机器之心· 2025-11-06 08:58
Core Insights - The article discusses the advancements in the field of robotics, particularly focusing on the VLA models π0 and π0.5 developed by Physical Intelligence, which utilize flow matching techniques to generate high-dimensional and smooth continuous action sequences, demonstrating significant advantages in complex manipulation tasks [2][3]. Group 1: VLA Models and Challenges - VLA models heavily rely on large-scale, high-quality human demonstration data, which is costly and time-consuming to collect and annotate [2]. - Reinforcement learning (RL) allows agents to explore and iteratively improve through real interactions with the environment, reducing the dependency on extensive data and enhancing the performance ceiling of supervised fine-tuning (SFT) [2]. Group 2: πRL Framework - A collaborative effort from institutions like Tsinghua University, Peking University, and CMU has led to the development of the πRL framework for online reinforcement learning fine-tuning of flow matching VLA models [3]. - The πRL framework achieved an average success rate of 97.6% for π0 and 98.3% for π0.5 on the LIBERO testing platform, validating the effectiveness of the fine-tuning approach [3]. Group 3: Technical Innovations - πRL introduces two technical routes: Flow-Noise and Flow-SDE, addressing the challenge of directly calculating the log-likelihood of output actions in flow matching VLA [8][10]. - Flow-Noise models the denoising process as a discrete Markov process, enabling the direct computation of the joint probability density of the denoised sequence [10]. - Flow-SDE combines the denoising process with environmental interaction, constructing a two-layer Markov Decision Process (MDP) [20]. Group 4: Performance Improvements - The πRL framework demonstrated a success rate increase of over 40% across 4,352 grasp-and-place task combinations, achieving final success rates exceeding 80% [3][24]. - In the LIBERO testing platform, πRL improved the average success rate of π0 from 57.6% to 97.6% and π0.5 from 77.1% to 98.3%, surpassing the performance of fully data-trained flow matching VLAs [19]. Group 5: Generalization and Robustness - The πRL algorithm significantly enhances the generalization capabilities of both models in new environments, as evidenced by tests involving domain randomization [26]. - The framework's ability to reduce the average number of steps required to complete tasks indicates improved efficiency compared to supervised fine-tuning [28]. Group 6: Future Directions - Future developments of πRL will include more benchmark tests, deeper analysis of out-of-distribution (OOD) generalization capabilities, and further exploration of critic design for improved stability [35][36].
K-ScaleLabs产品与工程负责人离职!创立GradientRobotics新公司聚焦美国机器人与物理AI关键难题!
机器人大讲堂· 2025-10-26 10:03
Core Insights - Jingxiang Mo has left K-Scale Labs to establish a new company, Gradient Robotics, focusing on critical issues in robotics and physical AI in the U.S. [1][6] - During his tenure at K-Scale Labs, he led the development of K-Bot and Z-Bot, achieving significant milestones in under eight months with a small team [2][10]. Group 1: K-Scale Labs Achievements - K-Scale Labs successfully created K-Bot, an open-source humanoid robot platform, which was the first of its kind in the U.S. for consumer use, featuring a reinforcement learning-based motion system [10][11]. - The K-Bot project saw rapid success, with 150 units sold out quickly, generating over $2 million in sales, and attracting clients like Google DeepMind and OpenAI [11][13]. - The Z-Bot project aims for mass production with a price point below $1,000, and has already garnered significant interest, with over 20,000 people on the waiting list [13]. Group 2: Industry Context and Future Directions - The robotics and physical AI sectors are becoming competitive focal points, with companies like Physical Intelligence (PI) and FieldAI emerging as key players, both valued over $2 billion [18][29]. - PI focuses on developing general AI models for robots, while FieldAI aims to create AI models that can operate in real-world environments, showcasing the industry's demand for technological breakthroughs [24][27]. - Gradient Robotics is positioned to leverage its experience in open-source humanoid robotics to address the challenges in general robotics technology, drawing parallels to advancements in autonomous driving [4][30].
π0.5宣布开源!这下机器人泛化难题有解了?
机器人大讲堂· 2025-09-14 04:06
Core Viewpoint - The recent open-source release of the π0.5 model by Physical Intelligence enhances robotic capabilities through heterogeneous data collaborative training and multi-modal data fusion, enabling robots to understand task semantics and execute complex tasks accurately in real-world scenarios [1]. Technical Highlights of π0.5 - π0.5 employs heterogeneous data collaborative training, integrating data from various sources such as multiple robots, advanced semantic predictions, and network data, which enhances the model's generalization ability for real-world robotic tasks [2]. - The model fuses multi-modal data examples, including image observations, language commands, target detection, semantic sub-task predictions, and low-level actions, allowing robots to respond more accurately to instructions [4]. - Built on a general visual language model (VLM), π0.5 optimizes network structures to reduce information loss and improve multi-modal data processing efficiency, utilizing efficient convolutional neural networks for visual information and enhanced structures for understanding long text commands [6]. Addressing Generalization Challenges - Generalization has been a significant challenge for robots, but π0.5 improves performance as the number of training environments increases, achieving performance close to baseline models trained directly in test environments after approximately 100 training environments [7]. Practical Applications - π0.5 successfully completes tasks such as "organizing items in a drawer," "arranging laundry," and "cleaning dishes in a sink" in new real-world home environments, demonstrating its ability to handle complex and time-consuming tasks that require understanding task semantics and interacting with the correct objects [8][9]. Knowledge Transfer and Training Efficiency - The model enhances knowledge transfer from language to strategy through joint training of different modalities, creating a richer and more efficient training scheme for robotic learning systems, allowing for more flexible generalization [11]. Related Companies - Three companies closely associated with π0.5 include: 1. **Guanghe Tong**: Launched the Fibot platform, which integrates high-performance robotic domain controllers and multi-sensor fusion systems for real-time data capture [13]. 2. **Ark Infinite**: Provides hardware support for Physical Intelligence, demonstrating π0.5 in unfamiliar environments [16]. 3. **Stardust Intelligence**: An early partner of Physical Intelligence, contributing to the initial model training with their robots [18].
π0.5开源前,国内也开源了一个强大的端到端统一基础模型!具备强泛化和长程操作
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the release of π0.5 and WALL-OSS, highlighting their advancements in embodied intelligence and the significance of these models in the robotics industry, particularly in enhancing task execution in complex environments [1][3][5]. Group 1: Model Capabilities - π0.5 demonstrates enhanced generalization capabilities through heterogeneous task collaborative training, enabling robots to perform long-term, fine-grained operations in new household environments [3][5]. - WALL-OSS achieves embodied perception through large-scale multimodal pre-training, allowing seamless integration of instruction reasoning, sub-goal decomposition, and fine-grained action synthesis within a single differentiable framework [8][18]. - The model exhibits high success rates in complex long-term manipulation tasks, showcasing robust instruction-following abilities and understanding of complex scenarios, surpassing existing baseline models [8][18][28]. Group 2: Training and Data - The training process for WALL-OSS involves discrete, continuous, and joint phases, requiring only RTX 4090-level computational power for training and inference deployment [14][15]. - A multi-source dataset centered on embodied tasks was constructed, addressing the lack of large-scale, aligned VLA supervision and current visual language models' spatial understanding gaps [20][22]. - The dataset includes thousands of hours of data, focusing on both short-range operation tasks and long-range reasoning tasks, ensuring comprehensive training for the model [20][22][24]. Group 3: Experimental Analysis - Experimental analysis on embodied visual question answering and six robotic operation tasks focused on language instruction understanding, reasoning, and generalization, as well as planning and execution of long-term, multi-stage tasks [25][31]. - WALL-OSS significantly outperformed its original baseline model in object grounding, scene captioning, and action planning tasks, demonstrating its enhanced scene understanding capabilities [27][28]. - The model's ability to follow novel instructions without task-specific fine-tuning was validated, achieving 85% average task progress on known object instructions and 61% on novel object instructions [29][31]. Group 4: Industry Impact - The advancements in WALL-OSS and π0.5 are positioned to address existing limitations in visual language models and embodied understanding, paving the way for more capable and versatile robotic systems [5][8][20]. - The company, established in December 2023, focuses on developing a general embodied intelligence model using real-world data, aiming to create robots with fine operational capabilities [39]. - The recent completion of a nearly 1 billion yuan A+ round of financing indicates strong investor confidence in the company's direction and potential impact on the industry [39].
VLA+强化学习,会催生更强大的系统!
具身智能之心· 2025-07-31 00:04
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing tasks through visual language models and diverse datasets [5][10][11]. Group 1: RT-2 and Its Capabilities - RT-2 is introduced as a foundational robot model that can process visual questions and execute tasks based on language instructions, showcasing the potential of remote-accessible robotic models [5][7]. - The model's ability to convert robot control tasks into question-answer formats allows it to perform various basic language instructions effectively [7][8]. Group 2: RT-X Dataset and Its Impact - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, providing a diverse training ground for robotic models [10]. - Models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of cross-embodiment models [11]. Group 3: Evolution of VLA Models - The first-generation VLA model, RT-2, is noted for its simplicity, while the second-generation models utilize continuous action distributions for improved performance in complex tasks [14][15]. - The second-generation VLA models incorporate specialized mechanisms for generating continuous actions, enhancing their control capabilities [17][18]. Group 4: π0 and π0.5 Models - The π0 model, based on a large language model with 3 billion parameters, is designed to handle various tasks, including folding clothes, demonstrating its adaptability in different environments [18][23]. - The latest π0.5 model is aimed at executing long-term tasks in new environments, integrating high-level reasoning capabilities to manage complex instructions [28][30]. Group 5: Future Directions and Reinforcement Learning - Future VLA models are expected to integrate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [34][39]. - The combination of VLA and DLA (Deep Learning Architecture) is proposed to create a more effective system, leveraging expert data to improve generalist capabilities [44][46].
PI联合创始人,机器人大神!详解VLA+强化学习,催生更强大的系统
具身智能之心· 2025-07-30 06:03
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing complex tasks through improved data sets and model architectures [6][12][44]. Group 1: RT-2 and RT-X Models - RT-2 is introduced as a foundational robot model that utilizes a visual language model to process image-based commands and execute tasks [8][10]. - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, showcasing a diverse range of robotic capabilities [13][26]. - Cross-embodiment models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of generalization in robotic learning [13][29]. Group 2: Evolution of VLA Models - The first generation of VLA models, like RT-2, is based on simple question-answer structures for robot control, while the second generation incorporates continuous action distributions for better performance [16][19]. - The second generation VLA models, such as π0, utilize a large language model with an action expert module to handle complex tasks, generating action sequences over time [22][24]. - The π0.5 model is designed for long-term tasks, integrating high-level reasoning to execute complex instructions in new environments [36][40]. Group 3: Integration of Reinforcement Learning - Future VLA models are expected to incorporate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [44][49]. - The integration of reinforcement learning with VLA aims to create a more effective training process, allowing robots to learn from both expert data and real-world interactions [56][60]. - Current research is focused on developing stable and effective end-to-end training processes that leverage reinforcement learning to improve VLA capabilities [60].
进厂“试用期”一年,人形机器人“转正”还要跨过几道坎?
Di Yi Cai Jing· 2025-04-29 11:39
Core Insights - The development of humanoid robots for industrial applications faces significant challenges, particularly in the concept validation phase, which tests the engineering capabilities of teams [1][9][10] Group 1: VLA Model Development - Lingchu Intelligent recently launched the Psi-R1 model, a Vision-Language-Action (VLA) model, which aims to enable robots to perform complex tasks in open environments [2][4] - Since 2025, at least seven companies, including Physical Intelligence and NVIDIA, have released VLA-related models, indicating a growing interest in this technology [2][7] - The VLA model's ability to incorporate action signals as input is crucial for improving the robot's decision-making and operational capabilities [5][8] Group 2: Concept Validation Challenges - The concept validation phase requires humanoid robots to demonstrate technical success rates, reliability, efficiency, cost, and profitability, which are critical for commercial viability [3][10] - The transition from laboratory testing to real-world application involves multiple stages, including a three-month internal testing phase and a subsequent three-month validation phase in customer environments [12][13] - Real-world conditions, such as complex lighting and electromagnetic interference, pose additional challenges that must be addressed during the validation process [12][13] Group 3: Market Applications and Limitations - Current humanoid robots are primarily engaged in tasks such as material handling and inspection in various industrial settings, but their roles are often limited to simple operations [14][15] - Companies are focusing on scenarios where humanoid robots can perform tasks that are difficult for automated systems, such as quality inspection in 3C manufacturing [15] - The ultimate goal is for humanoid robots to take on roles that require flexibility and adaptability, which traditional automation cannot achieve [15]