模仿学习

Search documents
港大&清华最新!仅通过少量演示,实现动态物体操作的强泛化能力!
具身智能之心· 2025-08-21 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zhuoling Li等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 出发点与背景 动态物体操作(如传送带装配线上的产品处理)是提升工业制造效率的关键,但传统方法需针对不同场景进行专门设计,存在耗时、泛化能力弱等问题。模仿学 习通过专家演示训练机器人策略,是解决该问题的潜在方案,但现有方法依赖大量演示数据,而动态场景的演示收集成本极高。这里旨在探索:仅通过少量演 示,能否实现动态物体操作的强泛化能力? 工作的核心贡献 动态物体操作的挑战与现有方法局限 提出基于熵的理论框架,量化模仿学习的优化过程,指导低数据需求的泛化操作系统设计; 开发 GEM(Generalizable Entropy-based Manipulation)系统 ,结合目标中心几何感知与混合动作控制,实现动态物体操作的强泛化; 在真实场景(食堂餐具收集)中验证了GEM的有效性,无需现场演示即可实现97%以上的成功率 ...
25年8月8日理想VLA体验分享(包含体验过特斯拉北美FSD的群友)
理想TOP2· 2025-08-12 13:50
Core Insights - The article discusses the performance and user experience of the Li Auto's VLA (Vehicle Lane Assist) system compared to Tesla's FSD (Full Self-Driving) system, highlighting that while VLA shows promise, it still falls short of the seamless experience provided by FSD in certain scenarios [1][2][3]. Experience Evaluation - The experience is divided into three parts: driving in a controlled environment with no driver present, a one-hour public road test, and a two-hour self-selected route test [1]. - Feedback from users indicates that the VLA system provides a comfortable and efficient experience, particularly in controlled environments, but its performance in more complex road scenarios remains to be fully evaluated [2][3]. User Feedback - Users noted a significant difference in the braking experience of VLA, describing it as smooth and seamless compared to traditional driving, which enhances the perception of safety and comfort [3][4]. - The article emphasizes that the initial goal for autonomous driving systems should be to outperform 80% of average drivers before aiming for higher benchmarks [4][5]. Iteration Potential - The VLA system is believed to have substantial room for improvement compared to its predecessor, VLM, with potential advancements in four key areas: simulation data efficiency, maximizing existing hardware capabilities, enhancing model performance through reinforcement learning, and improving user voice control experiences [6][7]. - The article suggests that the shift to reinforcement learning for VLA allows for targeted optimizations in response to specific driving challenges, which was a limitation in previous models [8][9]. User Experience and Product Development - The importance of user experience is highlighted, with the assertion that in the AI era, product experience can be as crucial as technical capabilities [10]. - The voice control feature of VLA is seen as a significant enhancement, allowing for personalized driving experiences based on user preferences, which could improve overall satisfaction [10].
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
第一财经· 2025-08-11 14:51
2025.08. 11 本文字数:1430,阅读时长大约3分钟 作者 | 第一财 经 刘佳 在世界机器人大会上,宇树CEO王兴兴一口气提了不少"非共识"。他对 VLA (Vision-Language-Action视觉-语言-动作)模型持怀疑态度, 认为 这属于"相对傻瓜式架构";他还说机器人行业对数据关注度有点太高了,包括灵巧手在内的硬件虽然不够好但够用,行业最大的问题在于具 身智能的AI完全不够用。 王兴兴的观点在业内持续引发讨论。今日世界机器人大会上,记者留意到,国家地方共建人形机器人创新中心首席科学家江磊近20分钟的演 讲中,3次提到了王兴兴。 对于王兴兴关于"硬件足够用、大模型不够用"的观点,江磊分享了与阿里、华为等企业交流的体会:"我们是选不到一个很好的身体",并坦 承今天行业确实还用不上全参数模型,机器人的大脑、小脑、肢体需要深度协同;王兴兴质疑VLA并尝试用视频生成驱动机器人任务,江磊 承认"感知-认知-决策-执行的闭环尚未闭合",呼吁重构VLA模型,寻求新的解决范式;王兴兴还提到,机器人在RL(强化学习)的Scaling law(尺度定律)是非常值得做的方向,江磊认同表示,强化学习跟模仿学习 ...
干货 | 基于深度强化学习的轨迹规划(附代码解读)
自动驾驶之心· 2025-07-29 23:32
作者 | Vision 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1933268710770074901 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 背景 随着业界鼓吹端到端自动驾驶一年之后,最近又开始宣传vla和强化学习的等新的技术范式。vla概念来自最近一年业界巨火的具身智能领域,本质上跟端到端的自 动驾驶没有很明确的区别。本篇文章我们聚焦下强化学习这个技术范式。其实早在机器人领域早期,就有强化学习的身影,但一直由于其训练效率低下,复杂度 高,在工业界一直没有很广泛的运用。随着2018年alpha zero 围棋比赛,2023年chatgpt rlhf的推出,2025年初 deepseek-o1 在线推理的推出,强化学习在各个行业和 技术领域凸显出更广泛的使用潜力。在本着技术好奇的角度,结合最近两周对相关基础知识的理解,来讲讲作为一个计算机视觉(cv)背景的眼中,强化学习是个 什么概念。故下面很多概念类比可能 ...
端到端自动驾驶万字长文总结
自动驾驶之心· 2025-07-23 09:56
Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].
分层VLA模型与完全端到端VLA哪个方向好发论文?
自动驾驶之心· 2025-07-23 07:32
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, suggesting that there are still many opportunities for research in this area [1][2]. Group 1: VLA Research Topics - The VLA model represents a new paradigm in autonomous driving, integrating vision, language, and action to enhance decision-making capabilities [2][3]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models aim to improve interpretability and reliability by allowing the model to explain its decisions in natural language, thus increasing transparency and trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - Participants will engage in a 12-week online group research followed by 2 weeks of paper guidance, culminating in a 10-week maintenance period for their research papers [6]. - The course will provide insights into classic and cutting-edge papers, coding implementations, and writing methodologies, ultimately assisting participants in producing a research paper draft [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and basic programming skills [5][9]. - Participants are expected to have access to high-performance computing resources, ideally with multiple high-end GPUs, to facilitate their research [13][14]. - A preliminary assessment will be conducted to tailor the course content to the individual needs of participants, ensuring a focused learning experience [15]. Group 4: Course Highlights and Outcomes - The course features a "2+1" teaching model, providing comprehensive support from experienced instructors and research mentors [15]. - Participants will gain a thorough understanding of the research process, writing techniques, and submission strategies, enhancing their academic and professional profiles [15][20]. - The expected outcomes include a research paper draft, project completion certificates, and potential recommendation letters based on performance [15].
VLA之外,具身+VA工作汇总
自动驾驶之心· 2025-07-14 10:36
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robot learning and performance in real-world tasks [2][3][4]. Group 1: 2025 Research Highlights - Numerous projects are set for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic capabilities in manipulation and interaction [2]. - The "BEHAVIOR Robot Suite" aims to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotic technology [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for robots to learn complex tasks from minimal demonstrations, showcasing advancements in imitation learning [2]. Group 2: Methodological Innovations - The article discusses various innovative methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning," which aims to improve the adaptability of robots in different environments [2]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" highlights the focus on enhancing dexterity in robotic hands, crucial for complex manipulation tasks [4]. - "Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation" indicates a trend towards using synthetic data to train robots, which can significantly reduce the need for real-world data collection [7]. Group 3: Future Directions - The research agenda for 2024 and beyond includes projects like "Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching," which suggests a shift towards utilizing advanced data representations for improved learning outcomes [9]. - "Zero-Shot Framework from Image Generation World Model to Robotic Manipulation" indicates a future direction where robots can generalize from visual data without prior specific training, enhancing their versatility [9]. - The emphasis on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" reflects a growing interest in leveraging human demonstrations to improve robotic learning efficiency [7].
用动作分块突破RL极限,伯克利引入模仿学习,超越离线/在线SOTA
机器之心· 2025-07-14 04:08
Core Insights - Reinforcement Learning (RL) has achieved significant results across various fields, but its performance in tasks with long time spans and sparse rewards remains unsatisfactory [1][2] - Traditional RL methods often struggle with exploration efficiency in such tasks, as rewards are only received after executing long sequences of actions, making it difficult to find effective strategies in a reasonable timeframe [3][10] Method Overview - The introduction of Imitation Learning (IL) concepts into RL could potentially improve performance, particularly in scenarios with large state and action spaces where designing reward functions is challenging [4] - The proposed Q-chunking method incorporates action chunking into Temporal Difference (TD) based RL, addressing two core issues: enhancing exploration efficiency through temporally coherent action sequences and achieving faster value propagation without introducing bias from traditional n-step returns [5][12] Implementation Details - Q-chunking extends standard Q-learning to a time-extended action space, allowing the policy to predict sequences of actions over multiple steps rather than single-step actions [15] - The method includes a behavior constraint to ensure that the learned policy remains close to the offline data distribution, which is crucial for effective exploration and utilization of offline data [18][19] Experimental Results - The researchers tested Q-chunking in six sparse reward robotic manipulation tasks, demonstrating competitive performance in offline phases and high sample efficiency in online phases, particularly in challenging tasks [23][25] - Ablation studies showed that Q-chunking outperformed its variants and traditional n-step return baselines, highlighting the importance of learning in a time-extended action space [27] - The analysis indicated that action chunking leads to more temporally coherent actions, resulting in better state coverage and exploration efficiency [28][32]
Human2LocoMan:通过人类预训练学习多功能四足机器人操控
自动驾驶之心· 2025-07-04 10:27
Core Insights - The article presents a novel framework called Human2LocoMan for enhancing quadrupedal robots' manipulation capabilities through human pretraining, addressing the challenges of autonomous multi-functional operations in complex environments [5][9][38] - The framework utilizes a modular cross-entity transformer architecture (MXT) to facilitate effective data collection and transfer learning from human demonstrations to robotic strategies, demonstrating significant performance improvements in various tasks [10][36] Group 1: Framework and Methodology - The Human2LocoMan framework integrates remote operation and data collection systems to bridge the action space between humans and quadrupedal robots, enabling efficient acquisition of high-quality datasets [9][38] - The system employs extended reality (XR) technology to capture human actions and translate them into robotic movements, enhancing the robot's workspace and perception capabilities [9][12] - A modular design in the MXT architecture allows for the sharing of a common transformer backbone while maintaining entity-specific markers, facilitating effective strategy transfer across different robotic entities [16][37] Group 2: Experimental Results - Experiments conducted on six challenging household tasks showed an average success rate improvement of 41.9% and an 82.7% increase in out-of-distribution (OOD) scenarios when using human data for pretraining [6][10] - The framework demonstrated robust generalization capabilities, maintaining high performance even with limited robotic data, and significantly improving task execution in both ID and OOD scenarios [37][38] - The modular design of MXT was shown to outperform traditional methods, indicating its effectiveness in leveraging human data for enhanced robotic learning and performance [33][36] Group 3: Data Collection and Efficiency - The Human2LocoMan system allows for efficient data collection, achieving over 50 robotic trajectories and 200 human trajectories within 30 minutes, showcasing its potential for rapid data acquisition in complex tasks [30] - The framework supports a variety of operation modes, including single and dual-hand tasks, and is adaptable to different object types and scenarios, enhancing its applicability across various domains [30][36]
卡耐基梅隆大学!Human2LocoMan:通过人类预训练学习多功能四足机器人操控
具身智能之心· 2025-07-03 13:36
Core Insights - The article presents a novel framework called Human2LocoMan for enhancing quadrupedal robot manipulation through human pretraining, addressing the challenges of autonomous multi-functional operations in complex environments [4][38] - The framework utilizes a modular cross-entity Transformer architecture (MXT) to facilitate effective data collection and transfer learning from human demonstrations to robotic strategies [8][38] Group 1: Framework and Methodology - The Human2LocoMan framework integrates human data collection via extended reality (XR) technology, allowing for the mapping of human actions to robotic movements, thereby enhancing the robot's operational capabilities [7][10] - A unified reference framework is established to align actions between humans and the LocoMan robot, addressing the significant differences in dynamics and control systems between the two entities [12][10] - The MXT architecture is designed to share a common Transformer backbone while maintaining entity-specific markers, enabling effective transfer learning across different robotic platforms [16][8] Group 2: Experimental Results - The experiments demonstrated an average success rate improvement of 41.9% and an 79.7% enhancement in out-of-distribution (OOD) scenarios when using the proposed framework compared to baseline methods [4][8] - Pretraining with human data resulted in a 38.6% overall success rate increase and an 82.7% improvement in OOD scenarios, showcasing the effectiveness of human data in enhancing robotic performance [8][38] - The data collection efficiency was highlighted, with over 50 robot trajectories and 200 human trajectories collected within 30 minutes, indicating the framework's potential for rapid data acquisition [26][38] Group 3: Comparative Analysis - The MXT architecture outperformed state-of-the-art (SOTA) imitation learning methods in various tasks, demonstrating superior success rates and task scores, particularly in scenarios with limited data [30][34] - The modular design of MXT facilitated better generalization and reduced overfitting compared to other architectures, such as HPT, which struggled with severe overfitting issues [36][39] - The framework's ability to maintain high performance in long-sequence tasks indicates its robustness and effectiveness in real-world applications [36][38]