机器人学习
Search documents
通研院团队斩获CoRL 2025 杰出论文奖:UniFP 技术突破足式机器人力-位控制难题,系中国籍团队首次获此殊荣
机器人大讲堂· 2025-10-12 02:08
Core Insights - The article discusses the significance of the Conference on Robot Learning (CoRL) and highlights the achievement of a Chinese research team winning the Outstanding Paper Award for their work on UniFP, a unified control algorithm for legged robots [1][3]. Group 1: Conference Overview - CoRL is a leading academic conference in AI and robotics, showcasing cutting-edge research in robot learning [1]. - In 2025, CoRL received nearly 1,000 submissions, with 264 papers accepted after rigorous review [1]. Group 2: UniFP Algorithm - UniFP (Unified Force and Position Control Policy) is the first algorithm in legged robotics to unify force and position control within a single framework, overcoming traditional limitations [3][4]. - The algorithm is based on biomechanical impedance control principles, allowing robots to respond to environmental forces similarly to human muscle perception [3][4]. Group 3: Control Framework - The UniFP framework consists of three core modules: observation encoder, state estimator, and actuator, forming a complete control loop of perception, decision-making, and execution [7]. - The observation encoder processes historical state information, while the state estimator infers unmeasurable key states, enabling "sensorless force perception" [7][8]. Group 4: Performance Validation - The research team validated UniFP in various simulated contact scenarios and later on the Unitree B2-Z1 quadruped robot, demonstrating impressive multi-functional capabilities [8][10]. - In experiments, UniFP showed precise force control, adaptive force tracking, and compliant impedance control, outperforming traditional methods [10][11]. Group 5: Imitation Learning - UniFP's integration with imitation learning significantly enhances the robot's learning efficiency in contact-intensive tasks, achieving a success rate improvement of approximately 39.5% over traditional methods [11][13]. - The research demonstrated UniFP's versatility across different robot forms and tasks, confirming its generalizability [13][14].
机器人感知大升级,轻量化注入几何先验,成功率提升31%
3 6 Ke· 2025-09-28 12:09
Core Insights - The article discusses the challenges in enabling AI to truly "understand" the 3D world, particularly in the context of visual language action (VLA) models that rely on 2D image-text data [1][2]. Group 1: VLA Model Limitations - Current VLA models lack the necessary 3D spatial understanding for real-world operations, primarily relying on pre-trained visual language models [1]. - Existing enhancement methods based on explicit depth input face deployment difficulties and precision noise issues [1]. Group 2: Evo-0 Model Introduction - Shanghai Jiao Tong University and the University of Cambridge proposed a lightweight method called Evo-0 to enhance the spatial understanding of VLA models by implicitly injecting 3D geometric priors without requiring explicit depth input or additional sensors [2]. - Evo-0 utilizes the Visual Geometry Grounding Transformer (VGGT) to extract 3D structural information from multi-view RGB images, significantly improving spatial perception capabilities [2][3]. Group 3: Model Architecture and Training - Evo-0 integrates VGGT as a spatial encoder, introducing t3^D tokens that contain depth context and cross-view spatial correspondence [3]. - A cross-attention fusion module is employed to merge 2D visual tokens with 3D tokens, enhancing the understanding of spatial structures and object layouts [3][6]. - The model is trained efficiently by only fine-tuning the fusion module, LoRA adaptation layer, and action expert, reducing computational costs [6]. Group 4: Experimental Results - In RLBench simulation tasks, Evo-0 achieved an average success rate improvement of over 28.88% compared to baseline models, particularly excelling in tasks requiring complex spatial relationships [10][11]. - The robustness of Evo-0 was tested under five different interference conditions, consistently outperforming the baseline model pi0 [12][15]. Group 5: Conclusion - Evo-0's key innovation lies in extracting rich spatial semantics through VGGT, bypassing depth estimation errors and sensor requirements, thus enhancing the spatial modeling capabilities of VLA models [16].
宁波东方理工大学联培直博生招生!机器人操作/具身智能/机器人学习等方向
自动驾驶之心· 2025-08-21 09:04
Core Viewpoint - The article discusses the collaboration between Ningbo Dongfang University of Technology and prestigious institutions like Shanghai Jiao Tong University and University of Science and Technology of China to recruit doctoral students in the field of robotics, emphasizing a dual mentorship model and a focus on cutting-edge research in robotics and AI [1][2]. Group 1: Program Structure - The program allows students to register at either Shanghai Jiao Tong University or University of Science and Technology of China for the first year, followed by research work at Dongfang University under dual supervision [1]. - Graduates will receive a doctoral degree and diploma from either Shanghai Jiao Tong University or University of Science and Technology of China [1]. Group 2: Research Focus and Support - The research areas include robotics, control, and AI, with specific topics such as contact-rich manipulation, embodied intelligence, agile robot control, and robot learning [2]. - The lab provides ample research funding, administrative support, and encourages a balanced lifestyle for students, including physical exercise [2]. Group 3: Community and Networking - The article promotes a community platform for knowledge sharing in embodied intelligence, aiming to grow from 2,000 to 10,000 members within two years, facilitating discussions on various technical and career-related topics [3][5]. - The community offers resources such as technical routes, job opportunities, and access to industry experts, enhancing networking and collaboration among members [5][18]. Group 4: Educational Resources - The community has compiled extensive resources, including over 30 technical routes, open-source projects, and datasets relevant to embodied intelligence and robotics [17][21][31]. - Members can access a variety of learning materials, including books and research reports, to support their academic and professional development [27][24].
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].
影响市场重大事件:时隔10年,A股两融余额重回2万亿;全国一体化算力网算力池化、算网安全相关技术文件公开征求意见
Mei Ri Jing Ji Xin Wen· 2025-08-07 00:05
Group 1 - The National Data Standardization Technical Committee has publicly solicited opinions on two technical documents related to the National Integrated Computing Network, marking the transition from planning to implementation [1] - The Ministry of Industry and Information Technology expressed willingness to collaborate with APEC member economies to promote digital and AI innovation applications [2] - The A-share market's margin trading balance has reached 2 trillion yuan, the highest since July 2015, indicating increased trading activity [3] Group 2 - The Ministry of Transport, Ministry of Finance, and Ministry of Natural Resources have issued a new rural road enhancement action plan, focusing on innovative financing models and encouraging participation from financial institutions [4][9] - The National Development and Reform Commission has introduced a management method for central budget investment in training bases, emphasizing support for emerging fields with talent shortages and traditional industries with strong employment absorption [5] - The China Photovoltaic Industry Association is collecting opinions on the draft amendment to the Price Law, aiming to reflect the demands of the photovoltaic industry [6] Group 3 - Heilongjiang Province has implemented 20 policy measures to support the high-quality development of the high-end intelligent agricultural machinery industry [7] - Shanghai's financial regulatory authorities have introduced measures to promote the development of commercial health insurance, including tax deductions and optimized financing [8]
10%训练数据超越100%表现,机器人学习领域迎来重要突破
机器之心· 2025-06-11 03:54
Core Viewpoint - The ViSA-Flow framework represents a revolutionary approach to robot skill learning, significantly enhancing learning efficiency in data-scarce situations by extracting semantic action flows from large-scale human videos [4][36]. Group 1: Research Background and Challenges - Traditional robot imitation learning methods require extensive, meticulously curated datasets, which are costly to collect, creating a bottleneck for developing robots capable of diverse real-world tasks [7]. - Humans exhibit remarkable abilities to learn new skills through observation, focusing on semantically relevant components while filtering out irrelevant background information [8]. Group 2: Key Innovations - The core innovation of the ViSA-Flow framework is the introduction of Semantic Action Flow as an intermediate representation, capturing the essential spatiotemporal features of operator-object interactions, unaffected by surface visual differences [11]. - Key components of the framework include: 1. Semantic entity localization using pre-trained visual language models to describe and locate operators and task-related objects [11]. 2. Hand-object interaction tracking to maintain stable segmentation across frames [12]. 3. Flow-conditioned feature encoding to generate rich feature vectors while preserving visual context [13]. Group 3: Experimental Evaluation - In the CALVIN benchmark tests, ViSA-Flow outperformed all baseline methods using only 10% of annotated robot trajectories (1,768), achieving a success rate of 31.4% in completing five consecutive tasks, nearly double that of the next best method [19]. - The average sequence length of 2.96 further demonstrates ViSA-Flow's effectiveness in handling long-duration operational tasks [20]. Group 4: Ablation Studies - Ablation studies indicate that removing semantic entity localization significantly reduces performance, while omitting the time tracking phase decreases the average success length [26]. - The full ViSA-Flow model achieved a success rate of 89.0% in task completion, showcasing its robustness [21]. Group 5: Real-World Experiments - Real-world evaluations of ViSA-Flow included single-stage and long-duration operational tasks, demonstrating its ability to maintain performance across varying task complexities [23][30]. - The model's focus on operator and task-related objects allows for smooth transitions in spatial support as scenes change [31]. Group 6: Technical Advantages and Limitations - Advantages include data efficiency, cross-domain generalization, long-duration stability, and semantic consistency in task execution [40]. - Limitations involve the absence of explicit 3D geometric modeling, reliance on pre-trained components, and potential challenges in tasks requiring precise physical interactions [40]. Group 7: Future Directions - Future developments may include integrating physical modeling, reducing reliance on pre-trained components, combining with reinforcement learning algorithms, and expanding pre-training datasets [40]. Group 8: Significance and Outlook - ViSA-Flow represents a significant breakthrough in robot learning, proving the feasibility of extracting semantic representations from large-scale human videos for skill acquisition [36]. - The framework bridges the gap between human demonstration observation and robot execution, paving the way for more intelligent and efficient robotic learning systems [37].
马斯克:Optimus人形机器人2027年将在火星表面行走;阿里云发布通义灵码AI IDE,可调用3000多款工具丨AIGC日报
创业邦· 2025-05-31 00:57
Group 1 - Elon Musk announced that the Optimus humanoid robot will walk on the surface of Mars by 2027, with SpaceX planning to launch the robot aboard a Starship by the end of next year [1] - Alibaba Cloud released its first AI-native development environment tool, Tongyi Lingma AI IDE, which supports over 3,000 tools and has seen over 15 million plugin downloads, with numerous enterprises already integrating it [1] - Figure's CEO Brett Adcock revealed a major restructuring, merging three independent teams into the AI team "Helix" to accelerate robot learning and market expansion [1] Group 2 - The U.S. Department of Energy announced a partnership with NVIDIA and Dell to develop a next-generation flagship supercomputer, expected to be operational by 2026, named after Nobel Prize-winning biochemist Jennifer Doudna [1]