Workflow
模仿学习
icon
Search documents
一个近300篇工作的综述!从“高层规划和低层控制”来看Manipulation任务的发展
具身智能之心· 2026-01-06 00:32
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在具身智能领域,机器人操纵作为核心难题,随着视觉、语言及多模态学习的飞速发展迎来变革。大型基础模型的出现,大幅提升了机器人的感知与语义表征能 力,使其能在非结构化环境中基于自然语言指令完成任务。由西安交通大学、香港科技大学(广州)等多所高校联合撰写的综述,以 "高层规划 + 低层控制" 的统一 框架,系统梳理了基于学习的机器人操纵方法,明确了当前技术瓶颈与未来方向,为该领域的研究提供了全面且结构化的参考。 论文名称:Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives 论文链接:https://arxiv.org/pdf/2512.22983 项目链接:https://github.com/BaiShuangha ...
万字长文,VLA的架构和模型还有什么痛点?
具身智能之心· 2025-12-30 01:11
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 ★ 上次VLA模型+真机部署的圆桌受到了行业的一致好评。最近平台的同学也一直在整理对话的文稿,今天就为大家分享下第一部分" VLA的架构和模型 "相关内 容。 张强老师: 好,感谢主持人介绍,大家好,我是张强。我来自北京人形机器人中心,主要研究方向和研究背景都是在做人形机器人,大概从2021年开始做人形机器人。先后在 Fourier、GR-1 和 Embodied机器人,包括我们现在的天工平台上做了一些研究。我主要做的研究方向是运动控制,VLA 和一些基于人形机器人的世界模型和具身智 能大模型,希望大家关注我们的工作,然后今天也很高兴跟各位嘉宾。很高兴接受具身智能之心的邀请,很高兴跟各位嘉宾在一起讨论一下相关的问题,谢谢! 完整内容欢迎加入我们的具身社区获取: 具身智能之心知识星球 主持人: 好,那我们就正式开始,那么欢迎大家来到具身智能之心的圆 ...
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of reinforcement learning (RL) fine-tuning in trajectory planning for autonomous driving, emphasizing the transition from open-loop to closed-loop training methods to enhance the effectiveness of training models [3][4]. Group 1: Training Methodology - The mainstream planning modules based on learning typically use imitation learning, which can struggle with out-of-distribution scenarios during real-world testing [3]. - A closed-loop training approach is proposed, which simulates real vehicle testing environments, making it more effective than open-loop training [4]. - The article introduces a network structure based on Waymo's previous work, MotionLM, which outputs trajectories in an autoregressive manner, ensuring causal relationships are maintained [4][6]. Group 2: Input and Output Structure - The network's input is designed to be scene-centered, summarizing static information over a specified time frame rather than relying on the current frame alone, which helps prevent the vehicle from navigating outside the perceived road [6]. - Many imitation learning methods combine single-frame perception with ground truth (GT) data over several seconds, which can lead to causal inconsistencies if the perception range is limited [7]. Group 3: Reward Function and Training Phases - The training process consists of two phases: pretraining and reinforcement learning, with a simple reward function that balances efficiency and safety by considering both GT fitting and collision avoidance [11]. - The reward function is calculated by normalizing the rewards across all samples and time steps, allowing for the omission of a critic network, similar to the GRPO method [13]. Group 4: Challenges and Future Directions - The article notes that many imitation learning methods introduce auxiliary losses that can lead to undesirable model outputs, highlighting the limitations of open-loop training [14]. - The core value of reinforcement learning lies in closed-loop learning, which can significantly enhance model capabilities even with smaller datasets [14].
AAAI 2026 Oral | 机器人也能“看人学活”?一次示范就能学会新任务!
具身智能之心· 2025-12-12 01:22
Core Insights - The article discusses a novel approach to robot learning through human demonstration, emphasizing the importance of fine-grained action alignment between human and robot movements [3][4][8]. - The proposed method, Human2Robot, utilizes a new dataset (H&R) and a two-stage framework to enhance robot learning capabilities, enabling one-shot generalization to new tasks [3][4][9]. Summary by Sections Introduction - The article introduces the limitations of existing methods that rely on coarse alignment of human-robot video pairs, which often leads to a lack of understanding of fine-grained actions necessary for task generalization [3][8]. Methodology - A new dataset, H&R, consisting of 2,600 synchronized human and robot action videos, is introduced to facilitate better learning [9]. - The Human2Robot framework consists of two main stages: Video Prediction Model (VPM) and Action Decoder [12][16]. Video Prediction Model (VPM) - The VPM generates robot action videos based on human demonstrations, allowing the model to learn detailed action dynamics [13][14]. - The model captures key information about the robot's shape and human hand movements through Spatial UNet and Spatial-Temporal UNet [15]. Action Decoder - The Action Decoder translates the generated video features into specific robot movements, enabling real-time task execution without needing continuous video input [16][20]. Experimental Results - Human2Robot outperforms existing baseline methods by maintaining a success rate improvement of over 10-20% across various tasks, demonstrating its effectiveness in leveraging detailed human video conditions [20][27]. - The introduction of KNN in the Human2Robot framework shows that it can still perform well even without direct demonstration input, indicating robust task execution capabilities [20][27]. Generalization Capability - Human2Robot exhibits strong generalization across different tasks, including new positions and object instances, due to the clear action correspondences established by the H&R dataset [27]. Ablation Studies - The effectiveness of the VPM is validated through experiments showing that relying solely on human video input leads to poor performance, highlighting the necessity of the video generation process for reliable action mapping [25][26].
理想分享自动驾驶强化学习闭环训练框架
理想TOP2· 2025-11-27 16:10
Core Viewpoint - The article discusses the advancements in autonomous driving through the introduction of the AD-R1 framework, which utilizes closed-loop reinforcement learning to enhance safety and robustness in end-to-end autonomous driving systems, addressing the limitations of existing world models in predicting dangerous outcomes [2][4]. Group 1: Closed-Loop vs. Open-Loop Systems - Open-loop systems rely on offline data and static playback, while closed-loop systems interact dynamically with the environment, allowing for real-time adjustments to the vehicle's trajectory [1]. - The AD-R1 framework represents a significant step in closed-loop reinforcement learning for autonomous driving [1]. Group 2: Challenges in Imitation Learning - Imitation learning faces two main challenges: distribution shift due to unseen long-tail scenarios in the real world and the lack of negative feedback, making it difficult for AI to learn from mistakes [3]. - Optimistic bias is identified as a systemic flaw in reinforcement learning for autonomous driving, where models may generate unrealistic safe scenarios despite unsafe actions [3]. Group 3: AD-R1 Framework Components - The AD-R1 framework includes two core components: the development of an impartial world model and reinforcement learning based on future imaginings [4]. - The impartial world model employs counterfactual data synthesis to teach the model the consequences of unsafe driving behaviors [4]. Group 4: Model Training and Evaluation - The training process involves sampling candidate trajectories, imagining future scenarios using the impartial world model, scoring based on predicted outcomes, and updating the policy using the GRPO algorithm [8]. - The framework allows for detailed reward calculations through the use of 3D/4D voxel outputs, enhancing the evaluation of collision severity and ensuring vehicle stability on the road [8]. Group 5: Additional Features - Trajectory-aware gating is implemented to ensure the model focuses on relevant features along the driving path, while ego-trajectory fidelity loss penalizes deviations from the input control commands [6]. - The framework also includes volume collision penalties and vertical clearance checks to enhance safety in complex environments [8].
工业界算法专家带队!面向落地的端到端自动驾驶小班课
自动驾驶之心· 2025-11-21 00:04
Core Insights - The article emphasizes the importance of end-to-end production in the automotive industry, highlighting the scarcity of qualified talent in this area [1][3] - A newly designed advanced course on end-to-end production has been developed to address the industry's needs, focusing on practical applications and real-world scenarios [3][5] Course Overview - The course covers essential algorithms such as one-stage and two-stage end-to-end frameworks, reinforcement learning applications, and trajectory optimization techniques [5][10] - It aims to provide hands-on experience and insights into production challenges, making it suitable for individuals looking to advance or transition in their careers [5][18] Course Structure - Chapter 1 introduces the overview of end-to-end tasks, focusing on the integration of perception and control algorithms [10] - Chapter 2 discusses the two-stage end-to-end algorithm framework, including its modeling and information transfer methods [11] - Chapter 3 covers the one-stage end-to-end algorithm framework, emphasizing its advantages in information transmission [12] - Chapter 4 focuses on the application of navigation information in autonomous driving, detailing map formats and encoding methods [13] - Chapter 5 introduces reinforcement learning algorithms, highlighting their necessity alongside imitation learning [14] - Chapter 6 provides practical experience in trajectory output optimization, combining imitation and reinforcement learning [15] - Chapter 7 discusses fallback strategies for trajectory smoothing and reliability in production [16] - Chapter 8 shares production experiences from various perspectives, including data and model optimization [17] Target Audience - The course is designed for advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [18][19] Course Logistics - The course starts on November 30 and spans three months, featuring offline video lectures and online Q&A sessions [20]
刚刚,中美机器人爆发了一场论战
Hua Er Jie Jian Wen· 2025-11-18 08:41
Core Viewpoint - A video showcasing a humanoid robot from a Chinese startup, MindOn Tech, has sparked a global debate regarding its authenticity, with claims of "no acceleration, no remote control" being challenged by skeptics in the U.S. [1][4][10] Group 1: Video and Technology - The video features a humanoid robot performing tasks such as watering plants, throwing garbage, and playing with children, demonstrating impressive fluidity in its movements [2][4] - MindOn Tech claims that the robot operates autonomously without any external control, which has led to significant interest and skepticism [4][10] Group 2: Skepticism and Responses - Brett Adcock, CEO of Figure AI, expressed doubts about the video's authenticity, suggesting it may involve pre-recorded movements without real-time perception [5][7] - Adcock has previously criticized another Chinese robotics company, UBTECH, for allegedly using computer-generated imagery in their demonstrations [8][10] Group 3: Support for Authenticity - Supporters of MindOn Tech have provided backup footage to validate the video's claims, arguing that the robot's actions are feasible based on existing academic research [11][15] - Mike Kalil, a U.S. tech blogger, argues that the robot's capabilities are a result of integrating advanced research in imitation and reinforcement learning, indicating a significant engineering achievement [15] Group 4: Implications for the Industry - If MindOn Tech's software can deliver genuine functionality on cost-effective hardware like Unitree's G1, it could pose a serious threat to established players like Figure AI, 1X Technologies, and Tesla [17][18] - The current trend among U.S. companies focuses on vertical integration, developing both the AI software and the hardware, which may be challenged by MindOn Tech's approach [18][19] Group 5: Potential Market Shift - MindOn Tech's model suggests a decoupling of AI software and hardware, akin to the "Android model," which could disrupt the competitive landscape of humanoid robotics [19][20] - The competition may shift from hardware capabilities to the intelligence of the AI, potentially leading to a more open and flexible market environment [20][21] - This debate over the video's authenticity reflects a broader clash of technological approaches and business models, indicating a significant shift in the robotics industry [21]
HuggingFace联合牛津大学新教程开源SOTA资源库!
具身智能之心· 2025-10-27 00:02
Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].
手把手带你入门机器人学习,HuggingFace联合牛津大学新教程开源SOTA资源库
机器之心· 2025-10-26 07:00
Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].
DexCanvas:具身数据的规模、真实、力觉真的突破不了三缺一吗?
具身智能之心· 2025-10-10 00:02
Core Viewpoint - The article discusses the challenges and advancements in dexterous manipulation in robotics, highlighting the need for high-quality, multi-modal data to improve robotic grasping capabilities and the introduction of the DexCanvas dataset as a solution [1][15]. Group 1: Challenges in Dexterous Manipulation - Dexterous manipulation remains a significant challenge due to the need for precise control, high-dimensional motion planning, and real-time adaptation to dynamic environments [2][11]. - Existing hardware for dexterous manipulation is categorized into two types: two-finger grippers and multi-finger humanoid hands, with the latter being more suitable for complex tasks due to their higher degrees of freedom [2][3]. - Current learning methods for dexterous manipulation include imitation learning and reinforcement learning, each with its own advantages and limitations regarding data requirements and training complexity [4][9]. Group 2: Data Collection and Quality Issues - Data collection for dexterous manipulation is expensive and often lacks tactile and force information, with existing datasets being insufficient for large-scale pre-training [9][10]. - The article emphasizes the trade-off in data collection, where achieving scale, realism, and tactile feedback simultaneously is challenging [6][7]. - The DexCanvas dataset addresses the lack of force and tactile information in existing datasets, providing a comprehensive solution for high-quality data collection [17][21]. Group 3: DexCanvas Dataset Introduction - DexCanvas is a large-scale dataset launched by Lingqiao Intelligent Technology, designed to bridge the gap between cognitive and physical intelligence in robotics [15][16]. - The dataset includes complete multi-finger force/contact annotations optimized for systems with over 20 degrees of freedom, significantly enhancing data quality [17][21]. - DexCanvas offers a structured framework for data collection based on 22 types of human hand operation modes, integrating over 1,000 hours of real human demonstration data and 100,000 hours of physically simulated data [21][22]. Group 4: Data Generation and Enhancement - The dataset generation process involves capturing human demonstrations with high precision and using physical simulation to recover missing force control data [25][27]. - DexCanvas expands the dataset by altering object properties and initial conditions, resulting in a significant increase in data volume while maintaining force control information [28][29]. - Unlike pure simulation, DexCanvas is based on real human demonstrations, allowing for better generalization across different robotic platforms and tasks [30]. Group 5: Industry Impact and Future Prospects - The introduction of DexCanvas is expected to accelerate advancements in the field of robotics by providing essential data for physical interaction, which has been lacking in existing datasets [32]. - The article expresses anticipation for the open-sourcing of the dataset to further enhance research and development in related areas [32].