模仿学习 - filings, earnings calls, financial reports, news - Reportify

模仿学习

Search documents

工业界算法专家带队！面向落地的端到端自动驾驶小班课

自动驾驶之心· 2025-11-21 00:04

端到端作为这两年的量产关键词，是各家车企核心的招聘岗位。但市面上真正的量产人才少之又少，模型优化、场景优化、数据优化，再到下游的规划兜底，可以说端到端是一个全栈的岗位。从技术的成熟度和工业界的需求来看，端到端需要攻克的难题还有很多。导航信息的引入、强化学习调优、轨迹的建模及优化都有很多门道，目前也是量产第一线。为此我们花了三个月的时间设计了端到端量产进阶课程，从实战到落地层层展开。该课程涉及的核心算法包括：一段式端到端、两段式端到端、导航信息的量产应用、开闭环强化学习、扩散模型+强化学习、自回归+强化学习、时空联合规划等等，最后分享一些实际的量产经验。很多想进阶或者跳槽的同学苦于没有专家辅导，想转行但实际工作中无法接触到实际的量产优化，简历上往往不够亮眼，遇到问题连个请教的人都没有。点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线这门课程是自动驾驶之心联合工业界算法专家开设的《面向量产的端到端实战小班课》！课程只有一个重点：聚焦量产。从一段式、两段式、强化学习、导航应用、轨迹优化、兜底方案再到具体量产经验分享。面向就业直击落地，所以这门课 ...

端到端算法

时空联合规划

端到端算法

时空联合规划

刚刚，中美机器人爆发了一场论战

Hua Er Jie Jian Wen· 2025-11-18 08:41

一段"无加速、无遥控"的机器人视频，竟然让硅谷大佬坐不住了最近，一段来自中国初创公司的机器人视频在全球范围内引发了轩然大波。更有意思的是，这段视频不仅展示了令人惊艳的技术实力，还意外引发了一场跨越太平洋的"真假之辩"。中美机器人爆发了一场"论战" 就在最近，一个视频在国内外科技圈引爆了。视频的主角，是一个来自中国的人形机器人。它能浇花、扔垃圾、整理玩具、和孩子们玩飞盘，动作流畅得令人惊叹。更关键的是，发布方——一家名为"灵启万物（MindOn Tech）"的深圳初创公司强调，整个过程"无加速、无遥控"，完全由机器人自主完成。这家成立不久的公司背景也不简单，创始人来自腾讯。他们使用的硬件，是另一家中国公司宇树科技（Unitree）的 G1人形机器人。 "这是假的！" 美国CEO下场质疑这个视频如同一块巨石投入平静的湖面，迅速激起千层浪。视频的火爆，很快引来了大洋彼岸的关注。有美国网友直接在社交平台X上@了"美国宇树"Figure的创始人兼CEO——布雷特·爱德考克（Brett Adcock），问他："这是真的吗？" 这位CEO的回应，给火热的讨论浇上了一盆冷水：看起来像是一个开环回放的R ...

SIASUN(SZ:300024)

机器人技术路线之争

宇树G1人形机器人

机器人技术路线之争

宇树G1人形机器人

HuggingFace联合牛津大学新教程开源SOTA资源库！

具身智能之心· 2025-10-27 00:02

Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].

机器人学习

机器人学习

手把手带你入门机器人学习，HuggingFace联合牛津大学新教程开源SOTA资源库

机器之心· 2025-10-26 07:00

Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].

机器人学习

通用机器人策略

机器人学习

通用机器人策略

DexCanvas：具身数据的规模、真实、力觉真的突破不了三缺一吗？

具身智能之心· 2025-10-10 00:02

灵巧抓取为什么这么难？近两年，具身领域在认知、感知和规划层面取得了显著进展，但让机器人在物理世界中实现精细手部操控、像人类一样执行复杂的灵巧操作，仍是非常大的难题。目前具身领域已经突破了人类语言理解、物体和场景识别、规划具体任务步骤，但在灵活抓握、感知调节力度等方向还存在很多问题。真实场景中，灵巧抓取会面临精确控制、高维运动规划和实时适应动态环境等挑战，任务复杂性要求强大的机械设计和先进控制算法。而灵巧操作背后的硬件主要是灵巧手，又可以分为两类：两指夹爪和多指拟人化手。两指夹具因其可靠性、简单性和易于控制而被广泛使用。但这类硬件通常只有一个自由度，很难适配一些复杂任务。为此，类人的具备20+自由度的灵巧手应允而生。这些拟人化手更适合与为人类设计的物体和环境进行交互。 1）现有灵巧抓取与数据采集方案虽然国内外各大机器人公司都在发布海量数据集：百万级轨迹、千小时演示，但却缺乏相关力控信息。灵巧手数据好像一直脱离不开这样的定律：scale、真实、力觉只能三选二。数据获取方式决定了不能既要、又要、还要！目前灵巧抓取的学习方法主要分为2类：强化学习和模仿学习。模仿学习无需构建复杂世界模型和设计奖 ...

DexCanvas数据集

DexCanvas数据集

NeurIPS 2025 Spotlight | 只需一条演示，DexFlyWheel框架让机器人学会「自我造数据」

机器之心· 2025-10-09 04:43

当我们谈论机器人灵巧操作时，数据稀缺始终是悬浮在头顶的达摩克利斯之剑。在大模型、自动驾驶领域纷纷依靠海量数据 "涌现" 出强大能力的今天，机器人灵巧操作依然困在数据瓶颈。项目主页：https://DexFlyWheel.github.io 研究背景：为什么灵巧手数据生成如此困难？在具身智能快速发展的今天，覆盖多样化场景和任务的机器人数据集不断出现。但是面向五指灵巧手的操作数据集仍然缺乏。这背后有几个关键原因： 1. 传统方法失效。二指夹爪的生成方案在灵巧手上基本无法推广。启发式规划难以应对高维动作优化，LLM 虽然能提供语义引导，却难以生成精细的五指控制轨迹。 2. 高成本的人工示教。基于遥操作设备可以有效收集灵巧手数据，但是需大量人力、时间与资源。可扩展性低，难以形成多样化、规模化的数据集。 3. 纯强化学习效率低。完全依靠强化学习虽然可以训练出成功的策略并迭代成功轨迹，但往往出现手部动作不自然、机械臂抖动等问题，再加上探索效率低，难以高效产生高质量轨迹。近期，北京大学、哈尔滨工业大学联合 PsiBot 灵初智能提出首个自我增强的灵巧操作数据生成框架 —— DexFlyWheel。该框 ...

残差强化学习

DexFlyWheel框架

残差强化学习

DexFlyWheel框架

模仿学习无法真正端到端？

自动驾驶之心· 2025-10-08 23:33

BigBite思维随笔 . Big Bite Small Talk, 杂谈随笔，聊科技，AI，成长，理财，经验杂谈。Stay Hungry 作者 | BigBite 来源 | BigBite思维随笔原文链接：模仿学习无法真正端到端点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线以下文章来源于BigBite思维随笔，作者BigBite >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文自动驾驶行业新的技术名词层出不穷，在大家争论到底是VLA更好，还是世界模型更先进的时候，其实忽略了相比模型架构，训练方法才是决定功能效果的关键。事实上无论是VLA也好，世界行为模型也罢，本质上他们都是实现端到端的具体模型结构，可是随着越来越多头部企业在端到端的技术范式上努力探索投入，头部团队逐渐发现单纯依靠模仿学习实现不了彻底的端到端自动驾驶！那么模仿学习在自动驾驶领域中的问题和局限性到底在哪里呢？模仿学习假定专家数据是最优的模仿学习的潜在假设是每一条训练数据轨迹都给出了在当前状态下最优的行为真值，因此越接近训练数据的行 ...

端到端自动驾驶

Autos (Autonomous Driving)

端到端自动驾驶技术

端到端自动驾驶

Autos (Autonomous Driving)

端到端自动驾驶技术

VLA搞到现在，可能还是情绪价值的内容偏多一些......

自动驾驶之心· 2025-09-20 16:03

Core Insights - The article discusses the current state of end-to-end (E2E) technology in both academia and industry, highlighting the differences in approach and data availability between the two sectors [1][4][5] - It emphasizes the importance of data iteration speed in the AI model development process, suggesting that a slow data iteration can hinder technological advancements [2][4] - The article also explores the role of reinforcement learning in enhancing Vision-Language Models (VLA), particularly in scenarios where there are no definitive correct answers [6][7][9][10] Summary by Sections End-to-End Technology - The academic field is experiencing a proliferation of end-to-end methodologies, with various approaches emerging [1] - In contrast, the industrial sector is more pragmatic, facing computational limitations that exclude some popular models, but benefiting from vast amounts of data [4] - The success of models like ChatGPT is attributed to the internet's ability to provide extensive data, which is also true for the automotive industry where companies can easily gather massive driving data [4] Data and Technology Iteration - The article stresses that as technology evolves rapidly, the iteration of datasets must keep pace; otherwise, it will impede technological progress [2] - Research teams are increasingly publishing datasets alongside their papers to maintain high-impact outputs [3] Reinforcement Learning and VLA - Reinforcement learning is suitable for problems where there are no correct answers, only characteristics of correct and incorrect answers [7] - The training process in reinforcement learning allows for the identification of optimal solutions based on reward systems, thus reducing the need for extensive demonstration data [9] - The article notes that while short-term results of VLA applications may be uncertain, the long-term potential is widely recognized [10][11] Future of VLA - The article suggests that the importance of algorithms in VLA models extends beyond mere performance metrics; factors such as data availability and training strategies are crucial [12] - The community is encouraged to engage in discussions about the development and challenges of autonomous driving technologies [5][13][16]

VLA（Large Vision - Language Model）

VLA（Large Vision - Language Model）

当前的自动驾驶VLA，还有很多模块需要优化...

自动驾驶之心· 2025-09-18 11:00

Core Viewpoint - VLA (Vision-Language-Action) is emerging as a mainstream keyword in autonomous driving, with rapid advancements in both academia and industry, aiming to overcome the limitations of traditional modular architectures and enhance the capabilities of autonomous systems [1][5]. Summary by Sections VLA Research and Development - The transition from traditional modular architectures to end-to-end models is marked by the introduction of VLA, which aims to unify sensor inputs directly into driving commands, addressing previous bottlenecks in the development of autonomous driving systems [2][5]. - The VLA model leverages large language models (LLMs) to enhance reasoning, explanation, and interaction capabilities, making it a significant advancement in the field [5]. Traditional Modular Architecture - Early autonomous driving systems (L2-L4) utilized a modular design, where each module (e.g., object detection, trajectory prediction) was developed independently, leading to issues such as error accumulation and information loss [3]. - The limitations of traditional architectures include reliance on manually designed rules, making it difficult to handle complex traffic scenarios [3][4]. Emergence of Pure Vision End-to-End Models - The rise of pure vision end-to-end models, exemplified by NVIDIA's DAVE-2 and Wayve, aimed to simplify system architecture through imitation learning, but faced challenges related to transparency and generalization in unseen scenarios [4][5]. VLA Paradigm - The VLA paradigm introduces a new approach where language serves as a bridge between perception and action, enhancing the model's interpretability and trustworthiness [5]. - VLA models can utilize pre-trained knowledge from LLMs to better understand complex traffic situations and make logical decisions, improving generalization to novel scenarios [5]. Course Objectives and Structure - The course aims to provide a systematic understanding of VLA, addressing gaps in knowledge and practical skills, and includes a comprehensive curriculum covering various aspects of VLA research [6][12]. - The program consists of 12 weeks of online group research, followed by 2 weeks of paper guidance, and an additional 10 weeks for paper maintenance, focusing on both theoretical and practical applications [7][30]. Enrollment and Requirements - The course is designed for individuals with a background in deep learning and basic knowledge of autonomous driving algorithms, requiring familiarity with Python and PyTorch [16][19]. - The class size is limited to 6-8 participants to ensure personalized attention and effective learning [11]. Course Highlights - Participants will gain insights into classic and cutting-edge papers, coding skills, and methodologies for writing and submitting research papers, enhancing their academic and professional profiles [12][15][30].

大语言模型

视觉表征学习

VLA自动驾驶模型

大语言模型

视觉表征学习

VLA自动驾驶模型

西湖大学最新！ARFM：结合VLA模仿学习与强化学习的优势

具身智能之心· 2025-09-11 02:07

Core Viewpoint - The article discusses the limitations of current visual-language-action (VLA) models in complex tasks and introduces the Adaptive Reinforcement Flow Matching (ARFM) method to enhance their performance by integrating reinforcement learning (RL) capabilities with flow matching advantages [1][2][4]. Summary by Sections Current Status of VLA Models - VLA models based on flow matching have shown excellent performance in general robotic manipulation tasks, validated by large-scale pre-trained systems like RT-1 and PaLM-E, but they struggle with action precision in complex downstream tasks due to reliance on imitation learning [4][5]. Existing Solutions and Limitations - Previous attempts to fine-tune VLA models using offline RL methods, such as ReinboT, have been limited in effectiveness due to the indirect guidance of action prediction, highlighting the need for more effective offline RL fine-tuning methods [4][5]. Main Contributions - The ARFM method is introduced as a novel offline RL post-training approach specifically designed for VLA flow models, addressing the challenges of data quality extraction and improving the efficiency of offline RL fine-tuning [6][7]. Methodological Innovation - ARFM incorporates an adaptive scaling factor in the loss function to balance the advantages of RL while controlling gradient variance, leading to improved generalization, robustness against disturbances, and few-shot learning capabilities [6][8]. Experimental Validation - Extensive experiments on the LIBERO simulation benchmark and the UR5 robotic arm platform demonstrate that ARFM outperforms existing methods in various aspects, including generalization ability, robustness to dynamic disturbances, and efficiency in few-shot learning [6][8][29]. Core Algorithm Design - The ARFM framework is built around energy-weighted loss to integrate RL signals and an adaptive mechanism to ensure training stability, effectively overcoming the limitations of traditional imitation learning and existing offline RL fine-tuning methods [8][11]. Experimental Setup - The experiments utilized the LIBERO benchmark platform, which includes four core task suites, and real-world scenarios with the UR5 robotic arm, focusing on various manipulation tasks under different conditions [29][30]. Key Experimental Results - ARFM demonstrated superior performance in multi-task learning, action perturbation robustness, few-shot learning efficiency, and continual learning capabilities compared to baseline models, confirming its practical value in real-world robotic applications [32][35][38]. Conclusion - The ARFM method effectively balances the retention of RL advantage signals and the control of flow loss gradient variance, leading to enhanced performance in VLA flow models across various tasks and conditions, showcasing its applicability in real-world scenarios [49][47].

自适应强化流匹配（ARFM）方法

自适应强化流匹配（ARFM）方法