Workflow
模仿学习
icon
Search documents
DexCanvas:具身数据的规模、真实、力觉真的突破不了三缺一吗?
具身智能之心· 2025-10-10 00:02
灵巧抓取为什么这么难? 近两年,具身领域在认知、感知和规划层面取得了显著进展,但让机器人在物理世界中实现精细手部操控、像人类一样执行复杂的灵巧操作, 仍是非常大的难题。目前具身领域已经突破了人类语言理解、物体和场景识别、规划具体任务步骤,但在灵活抓握、感知调节力度等方向还存 在很多问题。 真实场景中,灵巧抓取会面临精确控制、高维运动规划和实时适应动态环境等挑战,任务复杂性要求强大的机械设计和先进控制算法。 而灵巧操作背后的硬件主要是灵巧手,又可以分为两类:两指夹爪和多指拟人化手。两指夹具因其可靠性、简单性和易于控制而被广泛使用。 但这类硬件通常只有一个自由度,很难适配一些复杂任务。为此,类人的具备20+自由度的灵巧手应允而生。这些拟人化手更适合与为人类设计 的物体和环境进行交互。 1)现有灵巧抓取与数据采集方案 虽然国内外各大机器人公司都在发布海量数据集:百万级轨迹、千小时演示,但却缺乏相关力控信息。灵巧手数据好像一直脱离不开这样的定 律:scale、真实、力觉只能三选二。 数据获取方式决定了不能既要、又要、还要! 目前灵巧抓取的学习方法主要分为2类:强化学习和模仿学习。 模仿学习无需构建复杂世界模型和设计奖 ...
NeurIPS 2025 Spotlight | 只需一条演示,DexFlyWheel框架让机器人学会「自我造数据」
机器之心· 2025-10-09 04:43
当我们谈论机器人灵巧操作时,数据稀缺始终是悬浮在头顶的达摩克利斯之剑。 在大模型、自动驾驶领域纷纷依靠海量数据 "涌现" 出强大能力的今天,机器人灵巧操作依然困在数据瓶颈。 项目主页:https://DexFlyWheel.github.io 研究背景: 为什么灵巧手数据生成如此困难? 在具身智能快速发展的今天,覆盖多样化场景和任务的机器人数据集不断出现。但是面向五指灵巧手的操作数据集仍然缺乏。这背后有几个关键原因: 1. 传统方法失效 。 二指夹爪的生成方案在灵巧手上基本无法推广。启发式规划难以应对高维动作优化,LLM 虽然能提供语义引导,却难以生成精细的五指控制轨 迹。 2. 高成本的人工示教 。基于遥操作设备可以有效收集灵巧手数据,但是需大量人力、时间与资源。可扩展性低,难以形成多样化、规模化的数据集。 3. 纯强化学习效率低 。完全依靠强化学习虽然可以训练出成功的策略并迭代成功轨迹,但往往出现手部动作不自然、机械臂抖动等问题,再加上探索效率低,难 以高效产生高质量轨迹。 近期,北京大学、哈尔滨工业大学联合 PsiBot 灵初智能提出首个自我增强的灵巧操作数据生成框架 —— DexFlyWheel。该框 ...
模仿学习无法真正端到端?
自动驾驶之心· 2025-10-08 23:33
BigBite思维随笔 . Big Bite Small Talk, 杂谈随笔,聊科技,AI,成长,理财,经验杂谈。Stay Hungry 作者 | BigBite 来源 | BigBite思维随笔 原文链接: 模仿学习无法真正端到端 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 以下文章来源于BigBite思维随笔 ,作者BigBite >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 自动驾驶行业新的技术名词层出不穷,在大家争论到底是VLA更好,还是世界模型更先进的时候,其实忽略了相比模型架构,训练方法才是决定功能效 果的关键。事实上无论是VLA也好,世界行为模型也罢,本质上他们都是实现端到端的具体模型结构,可是随着越来越多头部企业在端到端的技术范式 上努力探索投入,头部团队逐渐发现单纯依靠模仿学习实现不了彻底的端到端自动驾驶! 那么模仿学习在自动驾驶领域中的问题和局限性到底在哪里呢? 模仿学习假定专家数据是最优的 模仿学习的潜在假设是每一条训练数据轨迹都给出了在当前状态下最优的行为真值,因此越接近训练数据的行 ...
VLA搞到现在,可能还是情绪价值的内容偏多一些......
自动驾驶之心· 2025-09-20 16:03
Core Insights - The article discusses the current state of end-to-end (E2E) technology in both academia and industry, highlighting the differences in approach and data availability between the two sectors [1][4][5] - It emphasizes the importance of data iteration speed in the AI model development process, suggesting that a slow data iteration can hinder technological advancements [2][4] - The article also explores the role of reinforcement learning in enhancing Vision-Language Models (VLA), particularly in scenarios where there are no definitive correct answers [6][7][9][10] Summary by Sections End-to-End Technology - The academic field is experiencing a proliferation of end-to-end methodologies, with various approaches emerging [1] - In contrast, the industrial sector is more pragmatic, facing computational limitations that exclude some popular models, but benefiting from vast amounts of data [4] - The success of models like ChatGPT is attributed to the internet's ability to provide extensive data, which is also true for the automotive industry where companies can easily gather massive driving data [4] Data and Technology Iteration - The article stresses that as technology evolves rapidly, the iteration of datasets must keep pace; otherwise, it will impede technological progress [2] - Research teams are increasingly publishing datasets alongside their papers to maintain high-impact outputs [3] Reinforcement Learning and VLA - Reinforcement learning is suitable for problems where there are no correct answers, only characteristics of correct and incorrect answers [7] - The training process in reinforcement learning allows for the identification of optimal solutions based on reward systems, thus reducing the need for extensive demonstration data [9] - The article notes that while short-term results of VLA applications may be uncertain, the long-term potential is widely recognized [10][11] Future of VLA - The article suggests that the importance of algorithms in VLA models extends beyond mere performance metrics; factors such as data availability and training strategies are crucial [12] - The community is encouraged to engage in discussions about the development and challenges of autonomous driving technologies [5][13][16]
当前的自动驾驶VLA,还有很多模块需要优化...
自动驾驶之心· 2025-09-18 11:00
点击咨询匹配大牛导师 1. 传统模块化架构的时代: 早期的自动驾驶系统(L2-L4级)普遍采用模块化设计。每个模块(如 物体检测、轨迹预测、路径规划)被独立开发和优化。 优势: 逻辑清晰,各模块可独立调试和 验证,具有较好的可解释性。 瓶颈: 错误累积效应: 上游模块的微小误差会逐级传递并放大, 影响最终决策。 信息损失: 在模块间传递的结构化数据(如3D框、轨迹点)会损失原始传感器 信息中的丰富细节。 规则的局限性: 依赖大量人工设计的规则和参数,难以应对复杂、长尾的 交通场景(Corner Cases)。 2. 纯视觉端到端(模仿学习)的兴起: 以NVIDIA的DAVE-2、Wayve等为代表,研究者们尝试使用 深度神经网络,通过模仿学习(Imitation Learning)的方式,直接从人类驾驶员的驾驶视频和操 作数据中学习"像素到行为"的映射。 优势: 简化了系统架构,能从数据中自动学习复杂的驾驶 策略,无需繁琐的规则设计。 瓶颈: "黑箱"问题与可解释性差: 模型决策过程不透明,难以理 解其做出特定行为的原因,这对于安全至关重要的自动驾驶是致命缺陷。 因果混淆(Causal VLA绝对是今年自动驾 ...
西湖大学最新!ARFM:结合VLA模仿学习与强化学习的优势
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the limitations of current visual-language-action (VLA) models in complex tasks and introduces the Adaptive Reinforcement Flow Matching (ARFM) method to enhance their performance by integrating reinforcement learning (RL) capabilities with flow matching advantages [1][2][4]. Summary by Sections Current Status of VLA Models - VLA models based on flow matching have shown excellent performance in general robotic manipulation tasks, validated by large-scale pre-trained systems like RT-1 and PaLM-E, but they struggle with action precision in complex downstream tasks due to reliance on imitation learning [4][5]. Existing Solutions and Limitations - Previous attempts to fine-tune VLA models using offline RL methods, such as ReinboT, have been limited in effectiveness due to the indirect guidance of action prediction, highlighting the need for more effective offline RL fine-tuning methods [4][5]. Main Contributions - The ARFM method is introduced as a novel offline RL post-training approach specifically designed for VLA flow models, addressing the challenges of data quality extraction and improving the efficiency of offline RL fine-tuning [6][7]. Methodological Innovation - ARFM incorporates an adaptive scaling factor in the loss function to balance the advantages of RL while controlling gradient variance, leading to improved generalization, robustness against disturbances, and few-shot learning capabilities [6][8]. Experimental Validation - Extensive experiments on the LIBERO simulation benchmark and the UR5 robotic arm platform demonstrate that ARFM outperforms existing methods in various aspects, including generalization ability, robustness to dynamic disturbances, and efficiency in few-shot learning [6][8][29]. Core Algorithm Design - The ARFM framework is built around energy-weighted loss to integrate RL signals and an adaptive mechanism to ensure training stability, effectively overcoming the limitations of traditional imitation learning and existing offline RL fine-tuning methods [8][11]. Experimental Setup - The experiments utilized the LIBERO benchmark platform, which includes four core task suites, and real-world scenarios with the UR5 robotic arm, focusing on various manipulation tasks under different conditions [29][30]. Key Experimental Results - ARFM demonstrated superior performance in multi-task learning, action perturbation robustness, few-shot learning efficiency, and continual learning capabilities compared to baseline models, confirming its practical value in real-world robotic applications [32][35][38]. Conclusion - The ARFM method effectively balances the retention of RL advantage signals and the control of flow loss gradient variance, leading to enhanced performance in VLA flow models across various tasks and conditions, showcasing its applicability in real-world scenarios [49][47].
从近1000篇工作中,看具身智能的技术发展路线!
自动驾驶之心· 2025-09-07 23:34
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 每次再聊具身智能时,总能看到很多paper一直说 "突破、创新"。但很少完整地把整个技术路线串起来,让大家清晰地 知道具身是怎么发展的?碰到了哪些问题?未来的走向是怎么样的? 机器人操作如何让机械臂精准 "模仿" 人类?多模态融合怎样让智能体 "身临其境"?强化学习如何驱动系统自主进化? 遥操作与数据采集又怎样打破空间限制?这些具身智能的关键内容需要我们认真梳理下。 今天我们将会为大家带来领域里比较丰富的几篇研究综述,带你拆解这些方向的发展逻辑。 机器人操作相关 参考论文: The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey 论文链接:https://arxiv.org/abs/2507.11840 作者单位: 浙 ...
端到端自动驾驶的万字总结:拆解三大技术路线(UniAD/GenAD/Hydra MDP)
自动驾驶之心· 2025-09-01 23:32
Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [3][5][6]. Group 1: Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [5][6]. - The perception module takes sensor data as input and outputs bounding boxes for the prediction module, which then outputs trajectories for the planning module [6]. - End-to-end algorithms, in contrast, take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [6][10]. Group 2: Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as lack of interpretability, safety guarantees, and issues related to causal confusion [12][57]. - The reliance on imitation learning in end-to-end algorithms limits their ability to handle corner cases effectively, as they may misinterpret rare scenarios as noise [11][57]. - The inherent noise in ground truth data can lead to suboptimal learning outcomes, as human driving data may not represent the best possible actions [11][57]. Group 3: Current End-to-End Algorithm Implementations - The ST-P3 algorithm is highlighted as an early example of end-to-end autonomous driving, focusing on spatiotemporal learning with three core modules: perception, prediction, and planning [14][15]. - Innovations in ST-P3 include a perception module that uses a self-centered cumulative alignment technique, a dual-path prediction mechanism, and a planning module that incorporates prior information for trajectory optimization [15][19][20]. Group 4: Advanced Techniques in End-to-End Algorithms - The UniAD framework introduces a multi-task approach by incorporating five auxiliary tasks to enhance performance, addressing the limitations of traditional modular stacking methods [24][25]. - The system employs a full Transformer architecture for planning, integrating various interaction modules to improve trajectory prediction and planning accuracy [26][29]. - The VAD (Vectorized Autonomous Driving) method utilizes vectorized representations to better express structural information of map elements, enhancing computational speed and efficiency [32][33]. Group 5: Future Directions and Challenges - The article emphasizes the need for further research to overcome the limitations of current end-to-end algorithms, particularly in optimizing learning processes and handling exceptional cases [57]. - The introduction of multi-modal planning and multi-model learning approaches aims to improve trajectory prediction stability and performance [56][57].
基于深度强化学习的轨迹规划
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].
港大&清华最新!仅通过少量演示,实现动态物体操作的强泛化能力!
具身智能之心· 2025-08-21 00:03
Group 1 - The article discusses the challenges of dynamic object manipulation in industrial manufacturing and proposes a solution through a new system called GEM (Generalizable Entropy-based Manipulation) that achieves high generalization with minimal demonstration data [3][6]. - GEM combines target-centered geometric perception and mixed action control to effectively reduce data requirements while maintaining high success rates in dynamic environments [6][15]. - The system has been validated in real-world scenarios, achieving a success rate of over 97% in over 10,000 operations without the need for on-site demonstrations [6][44]. Group 2 - Dynamic object manipulation requires higher precision and real-time responsiveness compared to static object manipulation, making it a complex task [8]. - Existing methods face limitations such as the need for extensive demonstration data and poor scalability due to high costs associated with data collection in dynamic environments [11][13]. - The proposed entropy-based framework quantifies the optimization process in imitation learning, aiming to minimize the data needed for effective generalization [13][15]. Group 3 - The GEM system is designed to lower observation entropy and action conditional entropy, which are critical for reducing data requirements [15][16]. - The system utilizes a hardware platform with adjustable-speed conveyor belts and RGB-D cameras to track and manipulate objects effectively [20][21]. - Key components of GEM include a memory encoder that enhances performance by integrating historical data and a mixed action control mechanism that simplifies dynamic challenges [29][39]. Group 4 - Experimental results show that GEM outperforms seven mainstream methods in both simulated and real-world scenarios, with an average success rate of 85% [30][31]. - The system demonstrates robust performance across various moving speeds and object geometries, maintaining high success rates even with unseen objects [38][39]. - In practical applications, GEM has been successfully deployed in a cafeteria setting, handling challenges such as food residue and fast-moving items with a success rate of 97.2% [42][44].