模仿学习
Search documents
无需编写数千行代码 机器人观察人类动作就能学会摆放餐具
Ke Ji Ri Bao· 2026-02-16 01:22
Core Insights - A research team from Carlos III University of Madrid has developed an innovative robot capable of learning to set the table by observing human actions, marking a significant advancement in the development of home service robots [1][2] - The robot, named ADAM, can perform various household tasks such as delivering cups or medications, assisting with clothing, and basic kitchen organization, primarily aimed at supporting the elderly or those needing assistance [1][2] Group 1 - The new robot utilizes a combination of imitation learning and a mathematical framework called "Gaussian belief propagation," allowing it to learn basic movements through human demonstration and achieve real-time coordination between its arms [2] - The robot's workflow consists of three stages: perception, reasoning, and action, using 2D/3D laser sensors and RGB-D cameras to sense the environment, process information, and generate coordinated arm movement commands [2] - The research addresses the growing need for technology to assist an aging population, as the proportion of elderly individuals increases and caregiving resources become strained [2]
强化学习,正在决定智能驾驶的上限
3 6 Ke· 2026-02-10 04:45
Core Insights - The development of intelligent driving is not a linear technological curve but a result of the interplay between various technical paradigms, engineering constraints, and real-world scenarios [1] - As the industry moves beyond the proof-of-concept stage, single technical terms can no longer explain the real differences in capabilities [2] - Factors such as computing power, data quality, system architecture, and engineering stability are determining the upper and lower limits of intelligent driving [3] Group 1: Evolution of Learning Techniques - Recent discussions in intelligent driving technology reveal a trend where various paths, such as end-to-end, VLA, and world models, converge on the concept of reinforcement learning [5] - Reinforcement learning is transitioning from a "technical option" to a "mandatory option" in the industry [7] - The emergence of products like AlphaGo and ChatGPT has highlighted the effectiveness of allowing AI to learn through trial and error as the fastest evolutionary method [8][9] Group 2: Learning Methodologies - Understanding reinforcement learning requires a grasp of imitation learning, which was previously favored in intelligent driving [11] - Imitation learning allows AI to learn from human driving data but has limitations, such as inheriting bad habits and struggling with unfamiliar situations [14][16] - Reinforcement learning, as demonstrated by AlphaGo, allows AI to explore new strategies through self-play, leading to superior performance beyond human intuition [17] Group 3: Reinforcement Learning Mechanisms - Reinforcement learning operates on a trial-and-error basis, where the model learns to drive well through a cycle of feedback [26] - The design of reward functions is crucial, as it translates driving performance into quantifiable scores [30] - Balancing conflicting objectives, such as safety versus efficiency, is essential in reward function design [32] Group 4: World Models and Advanced Learning - The integration of world models with reinforcement learning enhances the training environment, allowing AI to simulate real-world scenarios [42][49] - High-fidelity virtual environments enable AI to consider long-term consequences of actions, improving decision-making [50] - The coupling of world models and reinforcement learning creates a feedback loop that accelerates model iteration and performance [52] Group 5: Industry Trends and Future Directions - The importance of data is being redefined, with a shift towards the ability to model the world rather than just relying on raw data [56] - Companies are focusing on enhancing the "modeling capacity" of their systems, which is crucial for intelligent driving [60] - The evolution of intelligent driving systems is moving towards a stage where AI can independently understand environments and refine strategies, marking a significant advancement in the industry [62]
AI赛车开创世界纪录背后的“弯道”与“换道”
Xin Lang Cai Jing· 2026-01-24 05:10
Core Insights - The AI racing team from Tsinghua University set a world record by completing the 10.77 km Tianmen Mountain course in 16 minutes and 10.838 seconds, showcasing advancements in AI-driven autonomous racing technology [1][3]. Group 1: Technical Challenges and Innovations - The Tianmen Mountain course presents a "composite extreme" testing environment due to satellite signal interruptions, steep slopes, and numerous sharp turns, requiring AI to make precise decisions in milliseconds [3]. - The team developed a dynamic local map loading algorithm to address issues with traditional full-load 3D point cloud maps, enabling real-time high-precision positioning [3][4]. - Data collection methods were enhanced through vehicle-cloud collaboration and a combination of virtual and real-world data, integrating factors like corner entry angles and road conditions into the AI model [3]. Group 2: Learning and Development Pathways - Since 2018, the Tsinghua research team has focused on a new end-to-end autonomous driving approach centered on reinforcement learning, significantly reducing training costs compared to traditional methods reliant on vast amounts of real vehicle data [4]. - The team introduced China's first fully neural network-based end-to-end autonomous driving system, marking a significant technological breakthrough in the industry [4]. Group 3: Real-World Application and Future Directions - The success at Tianmen Mountain serves as a critical test for autonomous technology, emphasizing the need for AI algorithms to be validated in real and extreme scenarios to ensure their effectiveness and robustness [5]. - The developed perception-positioning fusion technology allows vehicles to achieve high real-time and high-precision trajectory estimation, enhancing stability in critical situations [5]. - Despite rapid advancements in autonomous driving technology, there remains a notable gap between AI capabilities and human performance in extreme road conditions, indicating ample opportunities for future research and innovation [5].
李弘扬团队PlannerRFT:扩散轨迹规划新方案,提升复杂驾驶场景性能(同济&港大)
自动驾驶之心· 2026-01-21 09:16
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Hongchen Li等 编辑 | 自动驾驶之心 同济、上海创智学院、港大OpenDriveLab等团队的工作。基于闭环强化学习和高效微调的Diffusion Planner - PlannerRFT。提炼几个关键点: 基于扩散模型的规划器已成为自动驾驶中生成类人轨迹的一种极具潜力的方法。近期研究通过生成-评估循环中的奖励导向优化,将强化微调融入扩散规划器以提升其 鲁棒性。然而,这些方法难以生成多模态、场景自适应的轨迹,阻碍了微调过程中信息性奖励的利用效率。 为解决这一问题,港大OpenDriveLab联合同济大学等研究团队提出PlannerRFT——一种适用于基于扩散模型规划器的样本高效强化微调框架。PlannerRFT采用双分支 优化策略,在不改变原始推理流程的前提下,同时优化轨迹分布并自适应引导去噪过程朝向更具潜力的探索方向。为支持大规模并行学习,本文开发了nuMax仿真 器,其轨迹推演速度较原生nuPlan快10倍。大量实验表明,Pla ...
你的模型真的能打吗?操作任务的长尾场景评测来了
具身智能之心· 2026-01-20 00:33
Core Viewpoint - The article discusses the introduction of the GM-100 benchmark test, which aims to enhance the evaluation of robotic capabilities through a diverse set of 100 tasks designed to address the limitations of existing datasets and task designs in the field of robotics [1][4]. Group 1: Background and Motivation - The rapid development of robotic learning has led to the emergence of various datasets and task designs, but many focus on common tasks, resulting in a lack of coverage for complex and rare tasks [3][5]. - Existing datasets, such as Open X-Embodiment and Agibot, primarily concentrate on common actions like "pick and grasp," leading to significant biases in trained models and limiting their applicability in real-world scenarios [3][5]. Group 2: GM-100 Benchmark Test - The GM-100 benchmark consists of 100 carefully designed tasks that encompass various interaction scenarios and long-tail behaviors, aiming to provide a comprehensive assessment of robotic agents' capabilities [4][11]. - The tasks are developed based on systematic analysis and insights from human action understanding, ensuring they are executable and sufficiently challenging to differentiate the performance of various models [2][4]. Group 3: Task Design and Data Collection - The task design process involved analyzing previous research to eliminate redundancies and categorize tasks, revealing a significant bias towards common activities [5][9]. - A diverse set of tasks was generated using large language models, with human experts involved in the final selection to ensure high-quality and feasible tasks for current hardware constraints [10][11]. - Data collection for GM-100 was conducted through teleoperation, resulting in a medium-sized dataset with over 13,000 trajectories [13][16]. Group 4: Evaluation Metrics and Results - The evaluation of different baseline models on GM-100 tasks utilized several metrics, including Success Rate (SR), Partial Success Rate (PSR), and action prediction error, to provide a comprehensive performance assessment [22]. - The results indicated that the overall success rate was low, highlighting the inherent challenges of the tasks and the limitations of the training data [22].
中游智驾厂商,正在快速抢占端到端人才......
自动驾驶之心· 2026-01-16 02:58
Core Viewpoint - The article discusses the technological anxiety in the intelligent driving sector, particularly among midstream manufacturers, highlighting a slowdown in cutting-edge technology development and a trend towards standardized mass production solutions [1][2]. Group 1: Industry Trends - The mass production of cutting-edge technologies is expected to begin in 2026, with current advancements in intelligent driving technology stagnating [2]. - The overall market for passenger vehicles priced above 200,000 is around 7 million units, but leading new forces have not achieved even one-third of this volume [2]. - The maturity of end-to-end technology is seen as a prerequisite for larger-scale mass production, especially with the advancement of L3 regulations this year [2]. Group 2: Educational Initiatives - A course titled "Practical Class for End-to-End Mass Production" has been launched, focusing on the necessary technical capabilities for mass production in intelligent driving [2]. - The course emphasizes practical applications and is limited to a small number of participants, with only 8 spots remaining [2]. Group 3: Course Content Overview - The course covers various aspects of end-to-end algorithms, including: - Overview of end-to-end tasks, merging perception tasks, and designing learning-based control algorithms [7]. - Two-stage end-to-end algorithm frameworks, including modeling and information transfer between perception and planning [8]. - One-stage end-to-end algorithms that allow for lossless information transfer, enhancing performance [9]. - The application of navigation information in autonomous driving, including map formats and encoding methods [10]. - Introduction to reinforcement learning algorithms to complement imitation learning in driving behavior [11]. - Optimization of trajectory outputs through practical projects involving imitation and reinforcement learning [12]. - Post-processing logic for trajectory smoothing to ensure stability and reliability in mass production [13]. - Sharing of mass production experiences from multiple perspectives, including data, models, and rules [14]. Group 4: Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [15]. - Participants are expected to have access to a GPU with a recommended capability of 4090 or higher and familiarity with various algorithm frameworks [18].
一直霸榜的pi0.5,被中国的模型干下来了!!!
具身智能之心· 2026-01-12 00:03
Core Viewpoint - The article highlights the breakthrough of the "Spirit v1.5" model developed by Qianxun Intelligent Team, which has surpassed the international benchmark model pi0.5, marking a significant advancement for China in the field of embodied intelligence models [2]. Performance Comparison - The ranking of models in the RoboChallenge shows Spirit v1.5 leading with a score of 66.09 and a success rate of 50.33%, followed by pi0.5 with a score of 61.84 and a success rate of 42.67% [4]. Data Collection Challenges - The article discusses the limitations of relying on "clean" data for training models, which can lead to low diversity and scalability issues. Clean data often lacks the complexity of real-world scenarios, hindering the model's ability to generalize [5][7]. Training Methodology - Spirit v1.5 employs a training methodology that does not depend on highly curated "clean" demonstration data. Instead, it utilizes a diverse data collection paradigm that allows for the natural integration of multiple sub-tasks and atomic skills, enhancing the model's adaptability to real-world complexities [8][14]. Transfer Efficiency - Experimental results indicate that models pre-trained on diverse data exhibit significantly higher transfer efficiency on new tasks compared to those trained on traditional demonstration data, requiring less computational resources to achieve similar performance [9][12]. Scaling Findings - The article notes that as the scale of diverse experiences increases, the model's transfer efficiency improves, leading to a continuous decrease in validation error for new tasks. This suggests that task diversity is more critical than the number of single-task demonstrations [13][16]. Paradigm Shift in Pre-training - Spirit v1.5 represents a fundamental shift in the field of robotic learning, moving away from the reliance on highly curated datasets. The findings suggest that unstructured diversity serves as a better teacher for robust pre-training, enabling models to develop a foundational "physical intuition" for better adaptability in real-world environments [14].
一个近300篇工作的综述!从“高层规划和低层控制”来看Manipulation任务的发展
具身智能之心· 2026-01-06 00:32
Core Insights - The article discusses the transformative advancements in robotic manipulation driven by the rapid development of visual, language, and multimodal learning, emphasizing the role of large foundation models in enhancing robots' perception and semantic representation capabilities [1][2]. Group 1: High-Level Planning - High-level planning is responsible for clarifying action intentions, organizing sequences, and allocating environmental attention, providing structured guidance for low-level execution [4]. - The core components of high-level planning include task decomposition and decision guidance, integrating multimodal information to address "what to do" and "in what order" [4]. - Task planning based on large language models (LLMs) maps natural language to task steps, with methods like SayCan and Grounded Decoding enhancing execution skill selection and planning capabilities [5]. - Multimodal large language models (MLLMs) break the limitations of pure text input by integrating visual and language reasoning, with models like PaLM-E and VILA demonstrating superior performance in embodied tasks [8]. - Code generation techniques convert planning into executable programs, improving the precision of language-based plans through methods like Code as Policies and Demo2Code [9]. - Motion planning utilizes LLMs and VLMs to generate continuous motion targets, linking high-level reasoning with low-level trajectory optimization [10]. - Usability learning focuses on establishing intrinsic associations between perception and action across geometric, visual, semantic, and multimodal dimensions [11]. - 3D scene representation transforms environmental perception into structured action proposals, bridging perception and action through techniques like Gaussian splatting [12]. Group 2: Low-Level Learning Control - Low-level control translates high-level planning into precise physical actions, addressing the "how to do" aspect of robotic manipulation [14]. - Learning strategies for skill acquisition are categorized into three main types, including pre-training and model-free reinforcement learning [16]. - Input modeling defines how robots perceive the world, emphasizing the integration of multimodal signals through reinforcement learning and imitation learning [18]. - Visual-action models utilize both 2D and 3D visual inputs to enhance action generation, while visual-language-action models integrate semantic, spatial, and temporal information [19]. - Additional modalities like tactile and auditory signals improve robustness in contact-rich manipulation scenarios [20]. Group 3: Challenges and Future Directions - Despite significant technological advancements, robotic manipulation faces four core challenges: the lack of universal architectures, data and simulation bottlenecks, insufficient multimodal physical interaction, and safety and collaboration issues [23][27][28][29]. - Future research directions include developing a "robotic brain" for flexible modal interfaces, establishing autonomous data collection mechanisms, enhancing multimodal physical interaction, and ensuring safety in human-robot collaboration [30]. - The review emphasizes the need for a unified framework that integrates high-level planning and low-level control, with a focus on overcoming data efficiency, physical interaction, and safety collaboration bottlenecks to facilitate the transition of robotic manipulation from laboratory settings to real-world applications [31].
万字长文,VLA的架构和模型还有什么痛点?
具身智能之心· 2025-12-30 01:11
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 ★ 上次VLA模型+真机部署的圆桌受到了行业的一致好评。最近平台的同学也一直在整理对话的文稿,今天就为大家分享下第一部分" VLA的架构和模型 "相关内 容。 张强老师: 好,感谢主持人介绍,大家好,我是张强。我来自北京人形机器人中心,主要研究方向和研究背景都是在做人形机器人,大概从2021年开始做人形机器人。先后在 Fourier、GR-1 和 Embodied机器人,包括我们现在的天工平台上做了一些研究。我主要做的研究方向是运动控制,VLA 和一些基于人形机器人的世界模型和具身智 能大模型,希望大家关注我们的工作,然后今天也很高兴跟各位嘉宾。很高兴接受具身智能之心的邀请,很高兴跟各位嘉宾在一起讨论一下相关的问题,谢谢! 完整内容欢迎加入我们的具身社区获取: 具身智能之心知识星球 主持人: 好,那我们就正式开始,那么欢迎大家来到具身智能之心的圆 ...
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of reinforcement learning (RL) fine-tuning in trajectory planning for autonomous driving, emphasizing the transition from open-loop to closed-loop training methods to enhance the effectiveness of training models [3][4]. Group 1: Training Methodology - The mainstream planning modules based on learning typically use imitation learning, which can struggle with out-of-distribution scenarios during real-world testing [3]. - A closed-loop training approach is proposed, which simulates real vehicle testing environments, making it more effective than open-loop training [4]. - The article introduces a network structure based on Waymo's previous work, MotionLM, which outputs trajectories in an autoregressive manner, ensuring causal relationships are maintained [4][6]. Group 2: Input and Output Structure - The network's input is designed to be scene-centered, summarizing static information over a specified time frame rather than relying on the current frame alone, which helps prevent the vehicle from navigating outside the perceived road [6]. - Many imitation learning methods combine single-frame perception with ground truth (GT) data over several seconds, which can lead to causal inconsistencies if the perception range is limited [7]. Group 3: Reward Function and Training Phases - The training process consists of two phases: pretraining and reinforcement learning, with a simple reward function that balances efficiency and safety by considering both GT fitting and collision avoidance [11]. - The reward function is calculated by normalizing the rewards across all samples and time steps, allowing for the omission of a critic network, similar to the GRPO method [13]. Group 4: Challenges and Future Directions - The article notes that many imitation learning methods introduce auxiliary losses that can lead to undesirable model outputs, highlighting the limitations of open-loop training [14]. - The core value of reinforcement learning lies in closed-loop learning, which can significantly enhance model capabilities even with smaller datasets [14].