模仿学习
Search documents
无需编写数千行代码 机器人观察人类动作就能学会摆放餐具
Ke Ji Ri Bao· 2026-02-16 01:22
Core Insights - A research team from Carlos III University of Madrid has developed an innovative robot capable of learning to set the table by observing human actions, marking a significant advancement in the development of home service robots [1][2] - The robot, named ADAM, can perform various household tasks such as delivering cups or medications, assisting with clothing, and basic kitchen organization, primarily aimed at supporting the elderly or those needing assistance [1][2] Group 1 - The new robot utilizes a combination of imitation learning and a mathematical framework called "Gaussian belief propagation," allowing it to learn basic movements through human demonstration and achieve real-time coordination between its arms [2] - The robot's workflow consists of three stages: perception, reasoning, and action, using 2D/3D laser sensors and RGB-D cameras to sense the environment, process information, and generate coordinated arm movement commands [2] - The research addresses the growing need for technology to assist an aging population, as the proportion of elderly individuals increases and caregiving resources become strained [2]
强化学习,正在决定智能驾驶的上限
3 6 Ke· 2026-02-10 04:45
Core Insights - The development of intelligent driving is not a linear technological curve but a result of the interplay between various technical paradigms, engineering constraints, and real-world scenarios [1] - As the industry moves beyond the proof-of-concept stage, single technical terms can no longer explain the real differences in capabilities [2] - Factors such as computing power, data quality, system architecture, and engineering stability are determining the upper and lower limits of intelligent driving [3] Group 1: Evolution of Learning Techniques - Recent discussions in intelligent driving technology reveal a trend where various paths, such as end-to-end, VLA, and world models, converge on the concept of reinforcement learning [5] - Reinforcement learning is transitioning from a "technical option" to a "mandatory option" in the industry [7] - The emergence of products like AlphaGo and ChatGPT has highlighted the effectiveness of allowing AI to learn through trial and error as the fastest evolutionary method [8][9] Group 2: Learning Methodologies - Understanding reinforcement learning requires a grasp of imitation learning, which was previously favored in intelligent driving [11] - Imitation learning allows AI to learn from human driving data but has limitations, such as inheriting bad habits and struggling with unfamiliar situations [14][16] - Reinforcement learning, as demonstrated by AlphaGo, allows AI to explore new strategies through self-play, leading to superior performance beyond human intuition [17] Group 3: Reinforcement Learning Mechanisms - Reinforcement learning operates on a trial-and-error basis, where the model learns to drive well through a cycle of feedback [26] - The design of reward functions is crucial, as it translates driving performance into quantifiable scores [30] - Balancing conflicting objectives, such as safety versus efficiency, is essential in reward function design [32] Group 4: World Models and Advanced Learning - The integration of world models with reinforcement learning enhances the training environment, allowing AI to simulate real-world scenarios [42][49] - High-fidelity virtual environments enable AI to consider long-term consequences of actions, improving decision-making [50] - The coupling of world models and reinforcement learning creates a feedback loop that accelerates model iteration and performance [52] Group 5: Industry Trends and Future Directions - The importance of data is being redefined, with a shift towards the ability to model the world rather than just relying on raw data [56] - Companies are focusing on enhancing the "modeling capacity" of their systems, which is crucial for intelligent driving [60] - The evolution of intelligent driving systems is moving towards a stage where AI can independently understand environments and refine strategies, marking a significant advancement in the industry [62]
AI赛车开创世界纪录背后的“弯道”与“换道”
Xin Lang Cai Jing· 2026-01-24 05:10
Core Insights - The AI racing team from Tsinghua University set a world record by completing the 10.77 km Tianmen Mountain course in 16 minutes and 10.838 seconds, showcasing advancements in AI-driven autonomous racing technology [1][3]. Group 1: Technical Challenges and Innovations - The Tianmen Mountain course presents a "composite extreme" testing environment due to satellite signal interruptions, steep slopes, and numerous sharp turns, requiring AI to make precise decisions in milliseconds [3]. - The team developed a dynamic local map loading algorithm to address issues with traditional full-load 3D point cloud maps, enabling real-time high-precision positioning [3][4]. - Data collection methods were enhanced through vehicle-cloud collaboration and a combination of virtual and real-world data, integrating factors like corner entry angles and road conditions into the AI model [3]. Group 2: Learning and Development Pathways - Since 2018, the Tsinghua research team has focused on a new end-to-end autonomous driving approach centered on reinforcement learning, significantly reducing training costs compared to traditional methods reliant on vast amounts of real vehicle data [4]. - The team introduced China's first fully neural network-based end-to-end autonomous driving system, marking a significant technological breakthrough in the industry [4]. Group 3: Real-World Application and Future Directions - The success at Tianmen Mountain serves as a critical test for autonomous technology, emphasizing the need for AI algorithms to be validated in real and extreme scenarios to ensure their effectiveness and robustness [5]. - The developed perception-positioning fusion technology allows vehicles to achieve high real-time and high-precision trajectory estimation, enhancing stability in critical situations [5]. - Despite rapid advancements in autonomous driving technology, there remains a notable gap between AI capabilities and human performance in extreme road conditions, indicating ample opportunities for future research and innovation [5].
李弘扬团队PlannerRFT:扩散轨迹规划新方案,提升复杂驾驶场景性能(同济&港大)
自动驾驶之心· 2026-01-21 09:16
Core Viewpoint - The article discusses the development of PlannerRFT, a closed-loop and sample-efficient fine-tuning framework for diffusion model planners in autonomous driving, which significantly enhances closed-loop performance and safety in complex driving scenarios [4][48]. Group 1: Background and Motivation - Diffusion model planners have emerged as a powerful probabilistic paradigm for generating human-like driving trajectories in dynamic environments, but they face challenges such as distribution shift and goal misalignment, limiting their robustness and reliability in real-world applications [4][5]. - Reinforcement learning (RL) offers a potential solution by leveraging simulation data and simple rewards for expansion, with recent advancements in the generation-evaluation fine-tuning (RFT) paradigm balancing training efficiency and closed-loop planning performance [4][5]. Group 2: PlannerRFT Framework - PlannerRFT introduces a dual-branch optimization strategy that enhances trajectory distribution and adaptively guides the denoising process towards more promising exploration directions without altering the original inference flow [5][14]. - The framework employs a GPU-accelerated simulator, nuMax, which is ten times faster than the original nuPlan simulator, supporting large-scale parallel learning [6][24]. Group 3: Key Innovations - To achieve multi-modality, PlannerRFT incorporates an energy-based classifier guidance mechanism that injects residual offsets during the denoising process, enabling the model to generate diverse operational trajectories [8][15]. - An adaptive exploration strategy is designed to adjust the guidance scale based on scene context, enhancing the trajectory generation process to be more perception-aware [8][18]. Group 4: Performance Evaluation - Extensive evaluations on the nuPlan benchmark demonstrate that PlannerRFT achieves state-of-the-art performance, significantly improving safety and robustness in complex driving scenarios compared to baseline models [9][35]. - The framework shows notable enhancements in handling failure scenarios, such as collisions and lane departures, indicating its effectiveness in improving driving safety [9][35]. Group 5: Experimental Insights - The article highlights the importance of training data distribution, revealing that a balanced dataset combining collision and low-score scenarios yields the best results, while training solely on complex scenarios can hinder the planner's ability to handle routine driving actions [41][42]. - The survival reward mechanism is emphasized as a crucial factor in maintaining performance in challenging environments, encouraging the planner to delay failure events [43][28].
你的模型真的能打吗?操作任务的长尾场景评测来了
具身智能之心· 2026-01-20 00:33
Core Viewpoint - The article discusses the introduction of the GM-100 benchmark test, which aims to enhance the evaluation of robotic capabilities through a diverse set of 100 tasks designed to address the limitations of existing datasets and task designs in the field of robotics [1][4]. Group 1: Background and Motivation - The rapid development of robotic learning has led to the emergence of various datasets and task designs, but many focus on common tasks, resulting in a lack of coverage for complex and rare tasks [3][5]. - Existing datasets, such as Open X-Embodiment and Agibot, primarily concentrate on common actions like "pick and grasp," leading to significant biases in trained models and limiting their applicability in real-world scenarios [3][5]. Group 2: GM-100 Benchmark Test - The GM-100 benchmark consists of 100 carefully designed tasks that encompass various interaction scenarios and long-tail behaviors, aiming to provide a comprehensive assessment of robotic agents' capabilities [4][11]. - The tasks are developed based on systematic analysis and insights from human action understanding, ensuring they are executable and sufficiently challenging to differentiate the performance of various models [2][4]. Group 3: Task Design and Data Collection - The task design process involved analyzing previous research to eliminate redundancies and categorize tasks, revealing a significant bias towards common activities [5][9]. - A diverse set of tasks was generated using large language models, with human experts involved in the final selection to ensure high-quality and feasible tasks for current hardware constraints [10][11]. - Data collection for GM-100 was conducted through teleoperation, resulting in a medium-sized dataset with over 13,000 trajectories [13][16]. Group 4: Evaluation Metrics and Results - The evaluation of different baseline models on GM-100 tasks utilized several metrics, including Success Rate (SR), Partial Success Rate (PSR), and action prediction error, to provide a comprehensive performance assessment [22]. - The results indicated that the overall success rate was low, highlighting the inherent challenges of the tasks and the limitations of the training data [22].
中游智驾厂商,正在快速抢占端到端人才......
自动驾驶之心· 2026-01-16 02:58
Core Viewpoint - The article discusses the technological anxiety in the intelligent driving sector, particularly among midstream manufacturers, highlighting a slowdown in cutting-edge technology development and a trend towards standardized mass production solutions [1][2]. Group 1: Industry Trends - The mass production of cutting-edge technologies is expected to begin in 2026, with current advancements in intelligent driving technology stagnating [2]. - The overall market for passenger vehicles priced above 200,000 is around 7 million units, but leading new forces have not achieved even one-third of this volume [2]. - The maturity of end-to-end technology is seen as a prerequisite for larger-scale mass production, especially with the advancement of L3 regulations this year [2]. Group 2: Educational Initiatives - A course titled "Practical Class for End-to-End Mass Production" has been launched, focusing on the necessary technical capabilities for mass production in intelligent driving [2]. - The course emphasizes practical applications and is limited to a small number of participants, with only 8 spots remaining [2]. Group 3: Course Content Overview - The course covers various aspects of end-to-end algorithms, including: - Overview of end-to-end tasks, merging perception tasks, and designing learning-based control algorithms [7]. - Two-stage end-to-end algorithm frameworks, including modeling and information transfer between perception and planning [8]. - One-stage end-to-end algorithms that allow for lossless information transfer, enhancing performance [9]. - The application of navigation information in autonomous driving, including map formats and encoding methods [10]. - Introduction to reinforcement learning algorithms to complement imitation learning in driving behavior [11]. - Optimization of trajectory outputs through practical projects involving imitation and reinforcement learning [12]. - Post-processing logic for trajectory smoothing to ensure stability and reliability in mass production [13]. - Sharing of mass production experiences from multiple perspectives, including data, models, and rules [14]. Group 4: Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [15]. - Participants are expected to have access to a GPU with a recommended capability of 4090 or higher and familiarity with various algorithm frameworks [18].
一直霸榜的pi0.5,被中国的模型干下来了!!!
具身智能之心· 2026-01-12 00:03
Core Viewpoint - The article highlights the breakthrough of the "Spirit v1.5" model developed by Qianxun Intelligent Team, which has surpassed the international benchmark model pi0.5, marking a significant advancement for China in the field of embodied intelligence models [2]. Performance Comparison - The ranking of models in the RoboChallenge shows Spirit v1.5 leading with a score of 66.09 and a success rate of 50.33%, followed by pi0.5 with a score of 61.84 and a success rate of 42.67% [4]. Data Collection Challenges - The article discusses the limitations of relying on "clean" data for training models, which can lead to low diversity and scalability issues. Clean data often lacks the complexity of real-world scenarios, hindering the model's ability to generalize [5][7]. Training Methodology - Spirit v1.5 employs a training methodology that does not depend on highly curated "clean" demonstration data. Instead, it utilizes a diverse data collection paradigm that allows for the natural integration of multiple sub-tasks and atomic skills, enhancing the model's adaptability to real-world complexities [8][14]. Transfer Efficiency - Experimental results indicate that models pre-trained on diverse data exhibit significantly higher transfer efficiency on new tasks compared to those trained on traditional demonstration data, requiring less computational resources to achieve similar performance [9][12]. Scaling Findings - The article notes that as the scale of diverse experiences increases, the model's transfer efficiency improves, leading to a continuous decrease in validation error for new tasks. This suggests that task diversity is more critical than the number of single-task demonstrations [13][16]. Paradigm Shift in Pre-training - Spirit v1.5 represents a fundamental shift in the field of robotic learning, moving away from the reliance on highly curated datasets. The findings suggest that unstructured diversity serves as a better teacher for robust pre-training, enabling models to develop a foundational "physical intuition" for better adaptability in real-world environments [14].
一个近300篇工作的综述!从“高层规划和低层控制”来看Manipulation任务的发展
具身智能之心· 2026-01-06 00:32
Core Insights - The article discusses the transformative advancements in robotic manipulation driven by the rapid development of visual, language, and multimodal learning, emphasizing the role of large foundation models in enhancing robots' perception and semantic representation capabilities [1][2]. Group 1: High-Level Planning - High-level planning is responsible for clarifying action intentions, organizing sequences, and allocating environmental attention, providing structured guidance for low-level execution [4]. - The core components of high-level planning include task decomposition and decision guidance, integrating multimodal information to address "what to do" and "in what order" [4]. - Task planning based on large language models (LLMs) maps natural language to task steps, with methods like SayCan and Grounded Decoding enhancing execution skill selection and planning capabilities [5]. - Multimodal large language models (MLLMs) break the limitations of pure text input by integrating visual and language reasoning, with models like PaLM-E and VILA demonstrating superior performance in embodied tasks [8]. - Code generation techniques convert planning into executable programs, improving the precision of language-based plans through methods like Code as Policies and Demo2Code [9]. - Motion planning utilizes LLMs and VLMs to generate continuous motion targets, linking high-level reasoning with low-level trajectory optimization [10]. - Usability learning focuses on establishing intrinsic associations between perception and action across geometric, visual, semantic, and multimodal dimensions [11]. - 3D scene representation transforms environmental perception into structured action proposals, bridging perception and action through techniques like Gaussian splatting [12]. Group 2: Low-Level Learning Control - Low-level control translates high-level planning into precise physical actions, addressing the "how to do" aspect of robotic manipulation [14]. - Learning strategies for skill acquisition are categorized into three main types, including pre-training and model-free reinforcement learning [16]. - Input modeling defines how robots perceive the world, emphasizing the integration of multimodal signals through reinforcement learning and imitation learning [18]. - Visual-action models utilize both 2D and 3D visual inputs to enhance action generation, while visual-language-action models integrate semantic, spatial, and temporal information [19]. - Additional modalities like tactile and auditory signals improve robustness in contact-rich manipulation scenarios [20]. Group 3: Challenges and Future Directions - Despite significant technological advancements, robotic manipulation faces four core challenges: the lack of universal architectures, data and simulation bottlenecks, insufficient multimodal physical interaction, and safety and collaboration issues [23][27][28][29]. - Future research directions include developing a "robotic brain" for flexible modal interfaces, establishing autonomous data collection mechanisms, enhancing multimodal physical interaction, and ensuring safety in human-robot collaboration [30]. - The review emphasizes the need for a unified framework that integrates high-level planning and low-level control, with a focus on overcoming data efficiency, physical interaction, and safety collaboration bottlenecks to facilitate the transition of robotic manipulation from laboratory settings to real-world applications [31].
万字长文,VLA的架构和模型还有什么痛点?
具身智能之心· 2025-12-30 01:11
Core Viewpoint - The article discusses the advancements and challenges in the field of embodied intelligence, particularly focusing on the VLA (Vision-Language-Action) model and its implications for robotics and autonomous driving [13][14][35]. Group 1: VLA Model and Architecture - The VLA model architecture has become relatively standardized, with a trend towards modularization, allowing for various implementations while maintaining core functionalities [14][15]. - Current challenges include the VLA's generalization capabilities, which are not yet sufficient for practical applications, indicating a need for improved data quality and quantity [16][17]. - The integration of additional modalities, such as tactile feedback, is seen as crucial for enhancing the VLA's performance and generalization [17][18]. Group 2: Expert Insights - Experts from various backgrounds, including autonomous driving and robotics, emphasize the importance of transferring knowledge and practices from autonomous driving to embodied intelligence [8][9][10]. - The discussion highlights the need for a unified model in the future, although current implementations remain modular to address specific tasks effectively [22][24][36]. - The role of reinforcement learning (RL) is underscored, with experts suggesting that RL could significantly enhance the capabilities of VLA models, especially in learning from diverse data sources [30][31][32]. Group 3: Future Directions - Future innovations in VLA may focus on improving 3D representations and exploring new training paradigms that combine reinforcement learning with imitation learning [43][48]. - The integration of world models with VLA is proposed as a key area for development, aiming to enhance predictive capabilities and understanding of physical interactions in 3D environments [49][50]. - Experts agree that while the VLA framework is standardizing, there is still room for exploration and improvement, particularly in addressing the limitations of current models [41][42].
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of reinforcement learning (RL) fine-tuning in trajectory planning for autonomous driving, emphasizing the transition from open-loop to closed-loop training methods to enhance the effectiveness of training models [3][4]. Group 1: Training Methodology - The mainstream planning modules based on learning typically use imitation learning, which can struggle with out-of-distribution scenarios during real-world testing [3]. - A closed-loop training approach is proposed, which simulates real vehicle testing environments, making it more effective than open-loop training [4]. - The article introduces a network structure based on Waymo's previous work, MotionLM, which outputs trajectories in an autoregressive manner, ensuring causal relationships are maintained [4][6]. Group 2: Input and Output Structure - The network's input is designed to be scene-centered, summarizing static information over a specified time frame rather than relying on the current frame alone, which helps prevent the vehicle from navigating outside the perceived road [6]. - Many imitation learning methods combine single-frame perception with ground truth (GT) data over several seconds, which can lead to causal inconsistencies if the perception range is limited [7]. Group 3: Reward Function and Training Phases - The training process consists of two phases: pretraining and reinforcement learning, with a simple reward function that balances efficiency and safety by considering both GT fitting and collision avoidance [11]. - The reward function is calculated by normalizing the rewards across all samples and time steps, allowing for the omission of a critic network, similar to the GRPO method [13]. Group 4: Challenges and Future Directions - The article notes that many imitation learning methods introduce auxiliary losses that can lead to undesirable model outputs, highlighting the limitations of open-loop training [14]. - The core value of reinforcement learning lies in closed-loop learning, which can significantly enhance model capabilities even with smaller datasets [14].