机器人学习
Search documents
HuggingFace联合牛津大学新教程开源SOTA资源库!
具身智能之心· 2025-10-27 00:02
Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].
手把手带你入门机器人学习,HuggingFace联合牛津大学新教程开源SOTA资源库
机器之心· 2025-10-26 07:00
Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].
无需再训练即可增强性能!港大团队提出GPC框架,实现机器人「策略组合」
机器之心· 2025-10-19 09:17
Core Viewpoint - The article introduces the General Policy Composition (GPC) framework, which provides a novel, training-free solution to enhance the performance of robot control strategies by dynamically combining multiple pre-trained models during test time, thus overcoming the limitations of traditional training methods [2][5][7]. Summary by Sections Improving Policy Performance - GPC presents a paradigm shift in enhancing policy performance without relying on additional training, instead utilizing a method of combining existing strategies [6][15]. Innovative Theoretical Foundation - The framework is built on two key theoretical findings: 1. Functional-Level Improvement, which shows that convex combinations of decision scores from multiple pre-trained strategies can yield a more accurate combined score than any single strategy [9]. 2. System-Level Stability, which ensures that improvements in single-step errors propagate throughout the entire trajectory, leading to overall performance enhancement [10]. General "Policy Composer" - GPC's core advantage lies in its plug-and-play nature, allowing for the seamless integration of various robot strategies without the need for retraining [14][15]. Heterogeneous Strategy Flexibility - GPC can flexibly combine strategies across different architectures and modalities, effectively balancing information from various conditions to produce stable and coherent action trajectories [17][19]. Weight Search for Optimal Strategy - The weight search mechanism in GPC customizes optimal weight configurations for different tasks, emphasizing the importance of weight distribution in maximizing the effectiveness of the combined strategy [22][23]. Experimental Validation - GPC has demonstrated superior performance in both simulation and real-world environments, achieving significant success rate improvements over single baseline methods, with up to 7.55% in simulation tasks and 5-10% in real-world tasks [28][30]. Key Findings from Experiments - Three core findings from experiments highlight GPC's versatility: 1. GPC can achieve higher accuracy when combining strategies with moderate accuracy levels [29]. 2. The presence of a weak strategy can hinder overall performance, indicating the need for careful selection of contributing strategies [29]. 3. Performance is maximized when stronger strategies are given greater weight in the combination [29].
通研院团队斩获CoRL 2025 杰出论文奖:UniFP 技术突破足式机器人力-位控制难题,系中国籍团队首次获此殊荣
机器人大讲堂· 2025-10-12 02:08
Core Insights - The article discusses the significance of the Conference on Robot Learning (CoRL) and highlights the achievement of a Chinese research team winning the Outstanding Paper Award for their work on UniFP, a unified control algorithm for legged robots [1][3]. Group 1: Conference Overview - CoRL is a leading academic conference in AI and robotics, showcasing cutting-edge research in robot learning [1]. - In 2025, CoRL received nearly 1,000 submissions, with 264 papers accepted after rigorous review [1]. Group 2: UniFP Algorithm - UniFP (Unified Force and Position Control Policy) is the first algorithm in legged robotics to unify force and position control within a single framework, overcoming traditional limitations [3][4]. - The algorithm is based on biomechanical impedance control principles, allowing robots to respond to environmental forces similarly to human muscle perception [3][4]. Group 3: Control Framework - The UniFP framework consists of three core modules: observation encoder, state estimator, and actuator, forming a complete control loop of perception, decision-making, and execution [7]. - The observation encoder processes historical state information, while the state estimator infers unmeasurable key states, enabling "sensorless force perception" [7][8]. Group 4: Performance Validation - The research team validated UniFP in various simulated contact scenarios and later on the Unitree B2-Z1 quadruped robot, demonstrating impressive multi-functional capabilities [8][10]. - In experiments, UniFP showed precise force control, adaptive force tracking, and compliant impedance control, outperforming traditional methods [10][11]. Group 5: Imitation Learning - UniFP's integration with imitation learning significantly enhances the robot's learning efficiency in contact-intensive tasks, achieving a success rate improvement of approximately 39.5% over traditional methods [11][13]. - The research demonstrated UniFP's versatility across different robot forms and tasks, confirming its generalizability [13][14].
机器人感知大升级,轻量化注入几何先验,成功率提升31%
3 6 Ke· 2025-09-28 12:09
Core Insights - The article discusses the challenges in enabling AI to truly "understand" the 3D world, particularly in the context of visual language action (VLA) models that rely on 2D image-text data [1][2]. Group 1: VLA Model Limitations - Current VLA models lack the necessary 3D spatial understanding for real-world operations, primarily relying on pre-trained visual language models [1]. - Existing enhancement methods based on explicit depth input face deployment difficulties and precision noise issues [1]. Group 2: Evo-0 Model Introduction - Shanghai Jiao Tong University and the University of Cambridge proposed a lightweight method called Evo-0 to enhance the spatial understanding of VLA models by implicitly injecting 3D geometric priors without requiring explicit depth input or additional sensors [2]. - Evo-0 utilizes the Visual Geometry Grounding Transformer (VGGT) to extract 3D structural information from multi-view RGB images, significantly improving spatial perception capabilities [2][3]. Group 3: Model Architecture and Training - Evo-0 integrates VGGT as a spatial encoder, introducing t3^D tokens that contain depth context and cross-view spatial correspondence [3]. - A cross-attention fusion module is employed to merge 2D visual tokens with 3D tokens, enhancing the understanding of spatial structures and object layouts [3][6]. - The model is trained efficiently by only fine-tuning the fusion module, LoRA adaptation layer, and action expert, reducing computational costs [6]. Group 4: Experimental Results - In RLBench simulation tasks, Evo-0 achieved an average success rate improvement of over 28.88% compared to baseline models, particularly excelling in tasks requiring complex spatial relationships [10][11]. - The robustness of Evo-0 was tested under five different interference conditions, consistently outperforming the baseline model pi0 [12][15]. Group 5: Conclusion - Evo-0's key innovation lies in extracting rich spatial semantics through VGGT, bypassing depth estimation errors and sensor requirements, thus enhancing the spatial modeling capabilities of VLA models [16].
宁波东方理工大学联培直博生招生!机器人操作/具身智能/机器人学习等方向
自动驾驶之心· 2025-08-21 09:04
Core Viewpoint - The article discusses the collaboration between Ningbo Dongfang University of Technology and prestigious institutions like Shanghai Jiao Tong University and University of Science and Technology of China to recruit doctoral students in the field of robotics, emphasizing a dual mentorship model and a focus on cutting-edge research in robotics and AI [1][2]. Group 1: Program Structure - The program allows students to register at either Shanghai Jiao Tong University or University of Science and Technology of China for the first year, followed by research work at Dongfang University under dual supervision [1]. - Graduates will receive a doctoral degree and diploma from either Shanghai Jiao Tong University or University of Science and Technology of China [1]. Group 2: Research Focus and Support - The research areas include robotics, control, and AI, with specific topics such as contact-rich manipulation, embodied intelligence, agile robot control, and robot learning [2]. - The lab provides ample research funding, administrative support, and encourages a balanced lifestyle for students, including physical exercise [2]. Group 3: Community and Networking - The article promotes a community platform for knowledge sharing in embodied intelligence, aiming to grow from 2,000 to 10,000 members within two years, facilitating discussions on various technical and career-related topics [3][5]. - The community offers resources such as technical routes, job opportunities, and access to industry experts, enhancing networking and collaboration among members [5][18]. Group 4: Educational Resources - The community has compiled extensive resources, including over 30 technical routes, open-source projects, and datasets relevant to embodied intelligence and robotics [17][21][31]. - Members can access a variety of learning materials, including books and research reports, to support their academic and professional development [27][24].
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].
影响市场重大事件:时隔10年,A股两融余额重回2万亿;全国一体化算力网算力池化、算网安全相关技术文件公开征求意见
Mei Ri Jing Ji Xin Wen· 2025-08-07 00:05
Group 1 - The National Data Standardization Technical Committee has publicly solicited opinions on two technical documents related to the National Integrated Computing Network, marking the transition from planning to implementation [1] - The Ministry of Industry and Information Technology expressed willingness to collaborate with APEC member economies to promote digital and AI innovation applications [2] - The A-share market's margin trading balance has reached 2 trillion yuan, the highest since July 2015, indicating increased trading activity [3] Group 2 - The Ministry of Transport, Ministry of Finance, and Ministry of Natural Resources have issued a new rural road enhancement action plan, focusing on innovative financing models and encouraging participation from financial institutions [4][9] - The National Development and Reform Commission has introduced a management method for central budget investment in training bases, emphasizing support for emerging fields with talent shortages and traditional industries with strong employment absorption [5] - The China Photovoltaic Industry Association is collecting opinions on the draft amendment to the Price Law, aiming to reflect the demands of the photovoltaic industry [6] Group 3 - Heilongjiang Province has implemented 20 policy measures to support the high-quality development of the high-end intelligent agricultural machinery industry [7] - Shanghai's financial regulatory authorities have introduced measures to promote the development of commercial health insurance, including tax deductions and optimized financing [8]
10%训练数据超越100%表现,机器人学习领域迎来重要突破
机器之心· 2025-06-11 03:54
Core Viewpoint - The ViSA-Flow framework represents a revolutionary approach to robot skill learning, significantly enhancing learning efficiency in data-scarce situations by extracting semantic action flows from large-scale human videos [4][36]. Group 1: Research Background and Challenges - Traditional robot imitation learning methods require extensive, meticulously curated datasets, which are costly to collect, creating a bottleneck for developing robots capable of diverse real-world tasks [7]. - Humans exhibit remarkable abilities to learn new skills through observation, focusing on semantically relevant components while filtering out irrelevant background information [8]. Group 2: Key Innovations - The core innovation of the ViSA-Flow framework is the introduction of Semantic Action Flow as an intermediate representation, capturing the essential spatiotemporal features of operator-object interactions, unaffected by surface visual differences [11]. - Key components of the framework include: 1. Semantic entity localization using pre-trained visual language models to describe and locate operators and task-related objects [11]. 2. Hand-object interaction tracking to maintain stable segmentation across frames [12]. 3. Flow-conditioned feature encoding to generate rich feature vectors while preserving visual context [13]. Group 3: Experimental Evaluation - In the CALVIN benchmark tests, ViSA-Flow outperformed all baseline methods using only 10% of annotated robot trajectories (1,768), achieving a success rate of 31.4% in completing five consecutive tasks, nearly double that of the next best method [19]. - The average sequence length of 2.96 further demonstrates ViSA-Flow's effectiveness in handling long-duration operational tasks [20]. Group 4: Ablation Studies - Ablation studies indicate that removing semantic entity localization significantly reduces performance, while omitting the time tracking phase decreases the average success length [26]. - The full ViSA-Flow model achieved a success rate of 89.0% in task completion, showcasing its robustness [21]. Group 5: Real-World Experiments - Real-world evaluations of ViSA-Flow included single-stage and long-duration operational tasks, demonstrating its ability to maintain performance across varying task complexities [23][30]. - The model's focus on operator and task-related objects allows for smooth transitions in spatial support as scenes change [31]. Group 6: Technical Advantages and Limitations - Advantages include data efficiency, cross-domain generalization, long-duration stability, and semantic consistency in task execution [40]. - Limitations involve the absence of explicit 3D geometric modeling, reliance on pre-trained components, and potential challenges in tasks requiring precise physical interactions [40]. Group 7: Future Directions - Future developments may include integrating physical modeling, reducing reliance on pre-trained components, combining with reinforcement learning algorithms, and expanding pre-training datasets [40]. Group 8: Significance and Outlook - ViSA-Flow represents a significant breakthrough in robot learning, proving the feasibility of extracting semantic representations from large-scale human videos for skill acquisition [36]. - The framework bridges the gap between human demonstration observation and robot execution, paving the way for more intelligent and efficient robotic learning systems [37].
马斯克:Optimus人形机器人2027年将在火星表面行走;阿里云发布通义灵码AI IDE,可调用3000多款工具丨AIGC日报
创业邦· 2025-05-31 00:57
Group 1 - Elon Musk announced that the Optimus humanoid robot will walk on the surface of Mars by 2027, with SpaceX planning to launch the robot aboard a Starship by the end of next year [1] - Alibaba Cloud released its first AI-native development environment tool, Tongyi Lingma AI IDE, which supports over 3,000 tools and has seen over 15 million plugin downloads, with numerous enterprises already integrating it [1] - Figure's CEO Brett Adcock revealed a major restructuring, merging three independent teams into the AI team "Helix" to accelerate robot learning and market expansion [1] Group 2 - The U.S. Department of Energy announced a partnership with NVIDIA and Dell to develop a next-generation flagship supercomputer, expected to be operational by 2026, named after Nobel Prize-winning biochemist Jennifer Doudna [1]