模仿学习 - filings, earnings calls, financial reports, news - Reportify

模仿学习

Search documents

从近1000篇工作中，看具身智能的技术发展路线！

自动驾驶之心· 2025-09-07 23:34

点击下方卡片，关注" 具身智能之心 "公众号编辑丨具身智能之心本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。每次再聊具身智能时，总能看到很多paper一直说 "突破、创新"。但很少完整地把整个技术路线串起来，让大家清晰地知道具身是怎么发展的？碰到了哪些问题？未来的走向是怎么样的？机器人操作如何让机械臂精准 "模仿" 人类？多模态融合怎样让智能体 "身临其境"？强化学习如何驱动系统自主进化？遥操作与数据采集又怎样打破空间限制？这些具身智能的关键内容需要我们认真梳理下。今天我们将会为大家带来领域里比较丰富的几篇研究综述，带你拆解这些方向的发展逻辑。机器人操作相关参考论文： The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey 论文链接：https://arxiv.org/abs/2507.11840 作者单位：浙 ...

视觉-语言-动作模型（VLA）

视觉-语言-动作模型（VLA）

端到端自动驾驶的万字总结：拆解三大技术路线（UniAD/GenAD/Hydra MDP）

自动驾驶之心· 2025-09-01 23:32

Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [3][5][6]. Group 1: Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [5][6]. - The perception module takes sensor data as input and outputs bounding boxes for the prediction module, which then outputs trajectories for the planning module [6]. - End-to-end algorithms, in contrast, take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [6][10]. Group 2: Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as lack of interpretability, safety guarantees, and issues related to causal confusion [12][57]. - The reliance on imitation learning in end-to-end algorithms limits their ability to handle corner cases effectively, as they may misinterpret rare scenarios as noise [11][57]. - The inherent noise in ground truth data can lead to suboptimal learning outcomes, as human driving data may not represent the best possible actions [11][57]. Group 3: Current End-to-End Algorithm Implementations - The ST-P3 algorithm is highlighted as an early example of end-to-end autonomous driving, focusing on spatiotemporal learning with three core modules: perception, prediction, and planning [14][15]. - Innovations in ST-P3 include a perception module that uses a self-centered cumulative alignment technique, a dual-path prediction mechanism, and a planning module that incorporates prior information for trajectory optimization [15][19][20]. Group 4: Advanced Techniques in End-to-End Algorithms - The UniAD framework introduces a multi-task approach by incorporating five auxiliary tasks to enhance performance, addressing the limitations of traditional modular stacking methods [24][25]. - The system employs a full Transformer architecture for planning, integrating various interaction modules to improve trajectory prediction and planning accuracy [26][29]. - The VAD (Vectorized Autonomous Driving) method utilizes vectorized representations to better express structural information of map elements, enhancing computational speed and efficiency [32][33]. Group 5: Future Directions and Challenges - The article emphasizes the need for further research to overcome the limitations of current end-to-end algorithms, particularly in optimizing learning processes and handling exceptional cases [57]. - The introduction of multi-modal planning and multi-model learning approaches aims to improve trajectory prediction stability and performance [56][57].

端到端自动驾驶

多模态规划

端到端自动驾驶算法

端到端自动驾驶

多模态规划

端到端自动驾驶算法

基于深度强化学习的轨迹规划

自动驾驶之心· 2025-08-28 23:32

Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].

逆强化学习

马尔可夫决策过程

逆强化学习

马尔可夫决策过程

港大&清华最新！仅通过少量演示，实现动态物体操作的强泛化能力！

具身智能之心· 2025-08-21 00:03

Group 1 - The article discusses the challenges of dynamic object manipulation in industrial manufacturing and proposes a solution through a new system called GEM (Generalizable Entropy-based Manipulation) that achieves high generalization with minimal demonstration data [3][6]. - GEM combines target-centered geometric perception and mixed action control to effectively reduce data requirements while maintaining high success rates in dynamic environments [6][15]. - The system has been validated in real-world scenarios, achieving a success rate of over 97% in over 10,000 operations without the need for on-site demonstrations [6][44]. Group 2 - Dynamic object manipulation requires higher precision and real-time responsiveness compared to static object manipulation, making it a complex task [8]. - Existing methods face limitations such as the need for extensive demonstration data and poor scalability due to high costs associated with data collection in dynamic environments [11][13]. - The proposed entropy-based framework quantifies the optimization process in imitation learning, aiming to minimize the data needed for effective generalization [13][15]. Group 3 - The GEM system is designed to lower observation entropy and action conditional entropy, which are critical for reducing data requirements [15][16]. - The system utilizes a hardware platform with adjustable-speed conveyor belts and RGB-D cameras to track and manipulate objects effectively [20][21]. - Key components of GEM include a memory encoder that enhances performance by integrating historical data and a mixed action control mechanism that simplifies dynamic challenges [29][39]. Group 4 - Experimental results show that GEM outperforms seven mainstream methods in both simulated and real-world scenarios, with an average success rate of 85% [30][31]. - The system demonstrates robust performance across various moving speeds and object geometries, maintaining high success rates even with unseen objects [38][39]. - In practical applications, GEM has been successfully deployed in a cafeteria setting, handling challenges such as food residue and fast-moving items with a success rate of 97.2% [42][44].

基于熵的理论框架

基于熵的理论框架

跟随音乐舞动节拍！这款机器人集体舞蹈引关注

Xin Lang Ke Ji· 2025-08-15 03:26

Core Insights - The 2025 World Humanoid Robot Games, the first comprehensive competition featuring humanoid robots, officially commenced on August 15 in Beijing, attracting 280 teams and over 500 robots from 16 countries [1] Group 1: Event Overview - The event includes 26 categories and 487 matches, showcasing a wide range of robotic capabilities [1] - A notable performance involved the "Bridge Interface" humanoid robot, which executed synchronized dance movements in response to music, captivating the audience [1] Group 2: Technology and Innovation - The "Bridge Interface" humanoid robot utilizes the Deepmimic algorithm for its full-body imitation motion control solution, enabling high-precision transfer of complex human actions [1] - The technology employs a dual-stage approach of "imitation learning + reinforcement learning," allowing the robot to perform intricate actions such as dance and martial arts, as well as custom movements [1] - The core logic of the technology involves capturing human motion segments through motion capture devices, followed by imitation learning to replicate basic action frameworks, and reinforcement learning to optimize physical feasibility for stability and fluidity in robotic movements [1]

SIASUN(SZ:300024)

人形机器人

全身模仿运动控制解决方案

人形机器人

全身模仿运动控制解决方案

25年8月8日理想VLA体验分享(包含体验过特斯拉北美FSD的群友)

理想TOP2· 2025-08-12 13:50

Core Insights - The article discusses the performance and user experience of the Li Auto's VLA (Vehicle Lane Assist) system compared to Tesla's FSD (Full Self-Driving) system, highlighting that while VLA shows promise, it still falls short of the seamless experience provided by FSD in certain scenarios [1][2][3]. Experience Evaluation - The experience is divided into three parts: driving in a controlled environment with no driver present, a one-hour public road test, and a two-hour self-selected route test [1]. - Feedback from users indicates that the VLA system provides a comfortable and efficient experience, particularly in controlled environments, but its performance in more complex road scenarios remains to be fully evaluated [2][3]. User Feedback - Users noted a significant difference in the braking experience of VLA, describing it as smooth and seamless compared to traditional driving, which enhances the perception of safety and comfort [3][4]. - The article emphasizes that the initial goal for autonomous driving systems should be to outperform 80% of average drivers before aiming for higher benchmarks [4][5]. Iteration Potential - The VLA system is believed to have substantial room for improvement compared to its predecessor, VLM, with potential advancements in four key areas: simulation data efficiency, maximizing existing hardware capabilities, enhancing model performance through reinforcement learning, and improving user voice control experiences [6][7]. - The article suggests that the shift to reinforcement learning for VLA allows for targeted optimizations in response to specific driving challenges, which was a limitation in previous models [8][9]. User Experience and Product Development - The importance of user experience is highlighted, with the assertion that in the AI era, product experience can be as crucial as technical capabilities [10]. - The voice control feature of VLA is seen as a significant enhancement, allowing for personalized driving experiences based on user preferences, which could improve overall satisfaction [10].

新能源汽车

新能源汽车

质疑VLA模型、AI完全不够用？有从业者隔空回应宇树王兴兴

第一财经· 2025-08-11 14:51

Core Viewpoint - The article discusses the skepticism of Wang Xingxing, CEO of Yushu, regarding the VLA (Vision-Language-Action) model, suggesting that the robotics industry is overly focused on data while lacking sufficient embodied intelligence in AI [3][4]. Group 1: Challenges in Robotics - The traditional robotics industry faces three core challenges: perception limitations, decision-making gaps, and generalization bottlenecks [6][7]. - Current robots often rely on preset rules for task execution, making it difficult to understand complex and dynamic environments [6]. - In multi-task switching, traditional robots frequently require human intervention for reprogramming or strategy adjustments [6]. - Robots need extensive retraining and debugging when confronted with new tasks or scenarios [6]. Group 2: Need for Model Reconstruction - There is a call within the industry to reconstruct the VLA model and seek new paradigms for embodied intelligence [5][7]. - Jiang Lei emphasizes the need for a complete system that integrates both hardware and software, rather than merely relying on large language models [6]. - The current research landscape is fragmented, with large language model researchers focusing solely on language, while edge intelligence concentrates on smaller models [6]. Group 3: Future Directions - Jiang Lei proposes exploring cloud and edge computing collaboration to create a comprehensive deployment architecture for humanoid robots [6]. - The ideal "brain" model for humanoid robots should possess full parameter capabilities, while the "small brain" model deployed on the robot must achieve breakthroughs in size and real-time performance [6]. - The industry is optimistic about humanoid robots becoming a significant sector, with this year being referred to as the year of mass production for humanoid robots [7].

干货 | 基于深度强化学习的轨迹规划（附代码解读）

自动驾驶之心· 2025-07-29 23:32

Core Viewpoint - The article discusses the advancements and applications of reinforcement learning (RL) in the field of autonomous driving, highlighting its potential to enhance decision-making processes in dynamic environments. Group 1: Background and Concepts - The concept of VLA (Variational Learning Algorithm) and its relation to embodied intelligence is introduced, emphasizing its similarity to end-to-end autonomous driving [3] - Reinforcement learning has gained traction in various industries following significant milestones like AlphaZero in 2018 and ChatGPT in 2023, showcasing its broader applicability [3] - The article aims to explain reinforcement learning from a computer vision perspective, drawing parallels with established concepts in the field [3] Group 2: Learning Methods - Supervised learning in autonomous driving involves tasks like object detection, where a model is trained to map inputs to outputs using labeled data [5] - Imitation learning is described as a method where models learn actions by mimicking human behavior, akin to how children learn from adults [6] - Reinforcement learning differs from imitation learning by focusing on optimizing actions based on feedback from interactions with the environment, making it suitable for sequential decision-making tasks [7] Group 3: Advanced Learning Techniques - Inverse reinforcement learning is introduced as a method to derive reward functions from expert data, particularly useful when defining rewards is challenging [8] - The Markov Decision Process (MDP) is explained as a framework for modeling decision-making tasks, where states, actions, and rewards are interrelated [9] - Dynamic programming and Monte Carlo methods are discussed as techniques for solving reinforcement learning problems, emphasizing their role in optimizing decision-making processes [11][12] Group 4: Reinforcement Learning Algorithms - Various reinforcement learning algorithms are categorized, including on-policy and off-policy methods, highlighting their differences in training stability and data utilization [25][26] - The article outlines key algorithms such as Q-learning, SARSA, and policy gradient methods, explaining their mechanisms and applications in reinforcement learning [27][29] - Advanced algorithms like TRPO and PPO are presented, focusing on their strategies for ensuring stable training and optimizing policy updates [57][58] Group 5: Applications in Autonomous Driving - The importance of reward design in autonomous driving is emphasized, with safety, comfort, and efficiency being key factors [62] - The article discusses the need for closed-loop training systems in autonomous driving, where vehicle actions influence the environment, necessitating dynamic modeling of other vehicles [62] - The integration of end-to-end learning with reinforcement learning is highlighted as a method to adapt to changing environments in real-time [63]

逆强化学习

监督式学习

自动驾驶轨迹规划

逆强化学习

监督式学习

自动驾驶轨迹规划

端到端自动驾驶万字长文总结

自动驾驶之心· 2025-07-23 09:56

Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [1][3][53]. Summary by Sections Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [3]. - End-to-end algorithms take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [3][5]. - Traditional algorithms are easier to debug and have some level of interpretability, but they suffer from cumulative error issues due to the inability to ensure complete accuracy in perception and prediction modules [3][5]. Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as limited ability to handle corner cases, as they rely heavily on data-driven methods [7][8]. - The use of imitation learning in these algorithms can lead to difficulties in learning optimal ground truth and handling exceptional cases [53]. - Current end-to-end paradigms include imitation learning (behavior cloning and inverse reinforcement learning) and reinforcement learning, with evaluation methods categorized into open-loop and closed-loop [8]. Current Implementations - The ST-P3 algorithm is highlighted as an early work focusing on end-to-end autonomous driving, utilizing a framework that includes perception, prediction, and planning modules [10][11]. - Innovations in the ST-P3 algorithm include a perception module that uses a self-centered cumulative alignment technique and a prediction module that employs a dual-path prediction mechanism [11][13]. - The planning phase of ST-P3 optimizes predicted trajectories by incorporating traffic light information [14][15]. Advanced Techniques - The UniAD system employs a full Transformer framework for end-to-end autonomous driving, integrating multiple tasks to enhance performance [23][25]. - The TrackFormer framework focuses on the collaborative updating of track queries and detect queries to improve prediction accuracy [26]. - The VAD (Vectorized Autonomous Driving) method introduces vectorized representations for better structural information and faster computation in trajectory planning [32][33]. Future Directions - The article suggests that end-to-end algorithms still primarily rely on imitation learning frameworks, which have inherent limitations that need further exploration [53]. - The introduction of more constraints and multi-modal planning methods aims to address trajectory prediction instability and improve model performance [49][52].

端到端自动驾驶

多模态规划

Autonomous Driving

端到端自动驾驶

多模态规划

Autonomous Driving

分层VLA模型与完全端到端VLA哪个方向好发论文？

自动驾驶之心· 2025-07-23 07:32

Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, suggesting that there are still many opportunities for research in this area [1][2]. Group 1: VLA Research Topics - The VLA model represents a new paradigm in autonomous driving, integrating vision, language, and action to enhance decision-making capabilities [2][3]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models aim to improve interpretability and reliability by allowing the model to explain its decisions in natural language, thus increasing transparency and trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - Participants will engage in a 12-week online group research followed by 2 weeks of paper guidance, culminating in a 10-week maintenance period for their research papers [6]. - The course will provide insights into classic and cutting-edge papers, coding implementations, and writing methodologies, ultimately assisting participants in producing a research paper draft [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and basic programming skills [5][9]. - Participants are expected to have access to high-performance computing resources, ideally with multiple high-end GPUs, to facilitate their research [13][14]. - A preliminary assessment will be conducted to tailor the course content to the individual needs of participants, ensuring a focused learning experience [15]. Group 4: Course Highlights and Outcomes - The course features a "2+1" teaching model, providing comprehensive support from experienced instructors and research mentors [15]. - Participants will gain a thorough understanding of the research process, writing techniques, and submission strategies, enhancing their academic and professional profiles [15][20]. - The expected outcomes include a research paper draft, project completion certificates, and potential recommendation letters based on performance [15].

视觉 - 语言 - 行为模型

大语言模型

大型多模态模型

视觉 - 语言 - 行为模型

大语言模型

大型多模态模型