机器人学习
Search documents
Physical Intelligence内部员工分享(从数采到VLA再到RL)
自动驾驶之心· 2025-12-25 09:33
以下文章来源于具身智能之心 ,作者具身智能之心 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 原文链接:https://vedder.io/misc/state_of_robot_learning_dec_2025.html 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感 触,说了很多实话,质量很高,值得精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 具身智能之心 . 与世界交互,更进一步。具身智能之心是国内具身与机器人领域的专业技术平台,集企业咨询、在线教育、展会服务、线下培 训、硬件研发、技术方案为一体。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演 示,机器学习模 ...
机器人学习现状!PI团队内部员工分享(从数采到VLA再到RL)
具身智能之心· 2025-12-23 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感触,说了很多实话,质量很 高,值的精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演示,机器学习模型则尝试模 仿这些动作。形式上,策略训练采用监督式方法——给定机器人的状态 (例如摄像头图像、机器人关节角度以及可能的任务描述文本),policy 预测已演示的动作 a 通常是一个动作片段(action chunk),例如接下来约 50Hz 的 1 秒动 作)。 本文档旨在描述现代生物认知技术栈的构成,以及其不足之处和(不完整/笨拙的)变通方 ...
机器人学习现状!Physical Intelligence内部员工分享(从数采到VLA再到RL)
具身智能之心· 2025-12-20 16:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 原文链接:https://vedder.io/misc/state_of_robot_learning_dec_2025.html 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感触,说了很多实话,质量很 高,值的精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演示,机器学习模型则尝试模 仿这些动作。形式上,策略训练采用监督式方法——给定机器人的状态 (例如摄像头图像、机器人关节角度以及可能的任务描述文本),policy 预测已演示的动作 a 通常是一个动作片段(action chun ...
HuggingFace联合牛津大学新教程开源SOTA资源库!
具身智能之心· 2025-10-27 00:02
Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].
手把手带你入门机器人学习,HuggingFace联合牛津大学新教程开源SOTA资源库
机器之心· 2025-10-26 07:00
Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].
无需再训练即可增强性能!港大团队提出GPC框架,实现机器人「策略组合」
机器之心· 2025-10-19 09:17
Core Viewpoint - The article introduces the General Policy Composition (GPC) framework, which provides a novel, training-free solution to enhance the performance of robot control strategies by dynamically combining multiple pre-trained models during test time, thus overcoming the limitations of traditional training methods [2][5][7]. Summary by Sections Improving Policy Performance - GPC presents a paradigm shift in enhancing policy performance without relying on additional training, instead utilizing a method of combining existing strategies [6][15]. Innovative Theoretical Foundation - The framework is built on two key theoretical findings: 1. Functional-Level Improvement, which shows that convex combinations of decision scores from multiple pre-trained strategies can yield a more accurate combined score than any single strategy [9]. 2. System-Level Stability, which ensures that improvements in single-step errors propagate throughout the entire trajectory, leading to overall performance enhancement [10]. General "Policy Composer" - GPC's core advantage lies in its plug-and-play nature, allowing for the seamless integration of various robot strategies without the need for retraining [14][15]. Heterogeneous Strategy Flexibility - GPC can flexibly combine strategies across different architectures and modalities, effectively balancing information from various conditions to produce stable and coherent action trajectories [17][19]. Weight Search for Optimal Strategy - The weight search mechanism in GPC customizes optimal weight configurations for different tasks, emphasizing the importance of weight distribution in maximizing the effectiveness of the combined strategy [22][23]. Experimental Validation - GPC has demonstrated superior performance in both simulation and real-world environments, achieving significant success rate improvements over single baseline methods, with up to 7.55% in simulation tasks and 5-10% in real-world tasks [28][30]. Key Findings from Experiments - Three core findings from experiments highlight GPC's versatility: 1. GPC can achieve higher accuracy when combining strategies with moderate accuracy levels [29]. 2. The presence of a weak strategy can hinder overall performance, indicating the need for careful selection of contributing strategies [29]. 3. Performance is maximized when stronger strategies are given greater weight in the combination [29].
通研院团队斩获CoRL 2025 杰出论文奖:UniFP 技术突破足式机器人力-位控制难题,系中国籍团队首次获此殊荣
机器人大讲堂· 2025-10-12 02:08
Core Insights - The article discusses the significance of the Conference on Robot Learning (CoRL) and highlights the achievement of a Chinese research team winning the Outstanding Paper Award for their work on UniFP, a unified control algorithm for legged robots [1][3]. Group 1: Conference Overview - CoRL is a leading academic conference in AI and robotics, showcasing cutting-edge research in robot learning [1]. - In 2025, CoRL received nearly 1,000 submissions, with 264 papers accepted after rigorous review [1]. Group 2: UniFP Algorithm - UniFP (Unified Force and Position Control Policy) is the first algorithm in legged robotics to unify force and position control within a single framework, overcoming traditional limitations [3][4]. - The algorithm is based on biomechanical impedance control principles, allowing robots to respond to environmental forces similarly to human muscle perception [3][4]. Group 3: Control Framework - The UniFP framework consists of three core modules: observation encoder, state estimator, and actuator, forming a complete control loop of perception, decision-making, and execution [7]. - The observation encoder processes historical state information, while the state estimator infers unmeasurable key states, enabling "sensorless force perception" [7][8]. Group 4: Performance Validation - The research team validated UniFP in various simulated contact scenarios and later on the Unitree B2-Z1 quadruped robot, demonstrating impressive multi-functional capabilities [8][10]. - In experiments, UniFP showed precise force control, adaptive force tracking, and compliant impedance control, outperforming traditional methods [10][11]. Group 5: Imitation Learning - UniFP's integration with imitation learning significantly enhances the robot's learning efficiency in contact-intensive tasks, achieving a success rate improvement of approximately 39.5% over traditional methods [11][13]. - The research demonstrated UniFP's versatility across different robot forms and tasks, confirming its generalizability [13][14].
机器人感知大升级,轻量化注入几何先验,成功率提升31%
3 6 Ke· 2025-09-28 12:09
Core Insights - The article discusses the challenges in enabling AI to truly "understand" the 3D world, particularly in the context of visual language action (VLA) models that rely on 2D image-text data [1][2]. Group 1: VLA Model Limitations - Current VLA models lack the necessary 3D spatial understanding for real-world operations, primarily relying on pre-trained visual language models [1]. - Existing enhancement methods based on explicit depth input face deployment difficulties and precision noise issues [1]. Group 2: Evo-0 Model Introduction - Shanghai Jiao Tong University and the University of Cambridge proposed a lightweight method called Evo-0 to enhance the spatial understanding of VLA models by implicitly injecting 3D geometric priors without requiring explicit depth input or additional sensors [2]. - Evo-0 utilizes the Visual Geometry Grounding Transformer (VGGT) to extract 3D structural information from multi-view RGB images, significantly improving spatial perception capabilities [2][3]. Group 3: Model Architecture and Training - Evo-0 integrates VGGT as a spatial encoder, introducing t3^D tokens that contain depth context and cross-view spatial correspondence [3]. - A cross-attention fusion module is employed to merge 2D visual tokens with 3D tokens, enhancing the understanding of spatial structures and object layouts [3][6]. - The model is trained efficiently by only fine-tuning the fusion module, LoRA adaptation layer, and action expert, reducing computational costs [6]. Group 4: Experimental Results - In RLBench simulation tasks, Evo-0 achieved an average success rate improvement of over 28.88% compared to baseline models, particularly excelling in tasks requiring complex spatial relationships [10][11]. - The robustness of Evo-0 was tested under five different interference conditions, consistently outperforming the baseline model pi0 [12][15]. Group 5: Conclusion - Evo-0's key innovation lies in extracting rich spatial semantics through VGGT, bypassing depth estimation errors and sensor requirements, thus enhancing the spatial modeling capabilities of VLA models [16].
宁波东方理工大学联培直博生招生!机器人操作/具身智能/机器人学习等方向
自动驾驶之心· 2025-08-21 09:04
Core Viewpoint - The article discusses the collaboration between Ningbo Dongfang University of Technology and prestigious institutions like Shanghai Jiao Tong University and University of Science and Technology of China to recruit doctoral students in the field of robotics, emphasizing a dual mentorship model and a focus on cutting-edge research in robotics and AI [1][2]. Group 1: Program Structure - The program allows students to register at either Shanghai Jiao Tong University or University of Science and Technology of China for the first year, followed by research work at Dongfang University under dual supervision [1]. - Graduates will receive a doctoral degree and diploma from either Shanghai Jiao Tong University or University of Science and Technology of China [1]. Group 2: Research Focus and Support - The research areas include robotics, control, and AI, with specific topics such as contact-rich manipulation, embodied intelligence, agile robot control, and robot learning [2]. - The lab provides ample research funding, administrative support, and encourages a balanced lifestyle for students, including physical exercise [2]. Group 3: Community and Networking - The article promotes a community platform for knowledge sharing in embodied intelligence, aiming to grow from 2,000 to 10,000 members within two years, facilitating discussions on various technical and career-related topics [3][5]. - The community offers resources such as technical routes, job opportunities, and access to industry experts, enhancing networking and collaboration among members [5][18]. Group 4: Educational Resources - The community has compiled extensive resources, including over 30 technical routes, open-source projects, and datasets relevant to embodied intelligence and robotics [17][21][31]. - Members can access a variety of learning materials, including books and research reports, to support their academic and professional development [27][24].
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].