分层控制
Search documents
中金:具身智能走向数据驱动 高价值信息量成具身智能竞争核心
智通财经网· 2025-11-17 01:37
Core Insights - The report from CICC highlights that the short-term layered architecture remains mainstream due to engineering controllability, while VLA shows potential in complex tasks and human-machine interaction. The world model is viewed as a long-term direction due to its cross-device transfer capability [1] Group 1: Embodied Intelligence Algorithms - Layered control serves as the foundational architecture paradigm, utilizing a two-tier structure for engineering [1] - The VLA paradigm, based on VLM, enhances generalization and interaction capabilities, representing an active research direction [1] - The world model provides physical constraints through environmental modeling and future predictions, currently in the research-led stage [1] Group 2: Embodied Intelligence Data - Robotic data encompasses multimodal sources, with industries seeking low-cost data acquisition and high-efficiency application paths [2] - Data acquisition methods include real machines, video (first-person/third-person), and simulation [2] - Data security is a critical baseline, with humanoid robot manufacturers facing challenges related to permission isolation, data encryption systems, and cross-border transmission policies [2] Group 3: Hot Topics in Embodied Intelligence - The Scaling Law for robots has not yet seen explosive breakthroughs, with limitations such as insufficient real data production capacity and Sim2Real transfer being key constraints [3] - Benchmarking is driving the standardization of evaluation processes, as embodied robots lack a recognized quantitative framework [3] - Physical AI, which integrates physical knowledge with AI models, has progressed to applications in robotic operations [3]
只演示一次,机器人就会干活了?北大&BeingBeyond联合团队用“分层小脑+仿真分身”让G1零样本上岗
3 6 Ke· 2025-11-14 02:36
Core Insights - The DemoHLM framework proposed by a research team from Peking University and BeingBeyond offers a novel approach to humanoid robot loco-manipulation, enabling the generation of vast training data from a single human demonstration in a simulated environment, addressing key challenges in traditional methods [1][20]. Group 1: Challenges in Humanoid Robot Loco-Manipulation - Humanoid robot loco-manipulation faces a "triple dilemma" due to limitations in existing solutions, which either rely on simulation or require extensive real-world remote operation data, making them impractical for complex environments like homes and industries [3][6]. - Traditional methods suffer from low data efficiency, poor task generalization, and difficulties in sim-to-real transfer, leading to high costs and limited scalability [6][22]. Group 2: Innovations of DemoHLM - DemoHLM's core innovation lies in its "layered control + single demonstration data generation" approach, ensuring stability in full-body movements while achieving generalization with minimal data costs [7][20]. - The framework employs a hierarchical control architecture that balances flexibility and stability, decoupling motion control from task decision-making [8][20]. Group 3: Data Generation Process - DemoHLM allows for the generation of diverse training data from just one demonstration, automating the process through three stages: pre-operation, operation, and batch synthesis, which enhances the generalization capability of the strategy [9][20]. - The automated data generation process mitigates the traditional challenges of data collection in imitation learning, significantly improving efficiency [9][20]. Group 4: Experimental Validation - The framework was validated in both simulated environments and on a real Unitree G1 robot, demonstrating stable performance across ten mobile operation tasks, with significant improvements in success rates as synthetic data volume increased [10][15]. - The results showed that as the number of synthetic data points increased from 100 to 5000, success rates for tasks like "PushCube" and "OpenCabinet" improved dramatically, indicating the effectiveness of the data generation pipeline [15][20]. Group 5: Industry Implications and Future Directions - The breakthroughs achieved by DemoHLM provide critical technological support for the practical application of humanoid robots in various sectors, including household, industrial, and service environments [19][20]. - Future research will explore mixed training with real data and multi-modal perception to enhance robustness and address current limitations, such as reliance on simulation data and performance in complex occlusion scenarios [19][22].
只演示一次,机器人就会干活了?北大&BeingBeyond联合团队用“分层小脑+仿真分身”让G1零样本上岗
量子位· 2025-11-13 09:25
Core Insights - The article introduces the DemoHLM framework, which allows humanoid robots to generate extensive training data from a single human demonstration in a simulated environment, addressing key challenges in loco-manipulation [1][22]. Group 1: Challenges in Humanoid Robot Manipulation - Humanoid robot manipulation faces a "triple dilemma" due to limitations in existing solutions, which either rely on simulation or require extensive real-world remote operation data, making them impractical for complex environments like homes and industries [3][6]. - Traditional methods suffer from low data efficiency, poor task generalization, and difficulties in sim-to-real transfer, leading to high costs and limited scalability [6][20]. Group 2: Innovations of DemoHLM - DemoHLM employs a hierarchical control architecture that separates motion control from task decision-making, enhancing both flexibility and stability [7][20]. - The framework's key innovation is the ability to generate a vast amount of diverse training data from just one demonstration, significantly improving data efficiency and generalization capabilities [8][20]. Group 3: Experimental Validation - Comprehensive validation was conducted in both simulated environments (IsaacGym) and on the real Unitree G1 robot, covering ten manipulation tasks with notable success rates [9][19]. - As synthetic data volume increased from 100 to 5000, success rates for tasks improved significantly, demonstrating the effectiveness of the data generation pipeline [14][20]. Group 4: Industry Implications and Future Directions - DemoHLM's advancements provide critical technical support for the practical application of humanoid robots, reducing training costs and enhancing generalization across various scenarios [19][20]. - The framework is designed to be compatible with future upgrades, such as tactile sensors and multi-camera perception, paving the way for more complex operational environments [21][20].
波士顿动力狗gogo回来了,“五条腿”协同发力
3 6 Ke· 2025-10-15 13:02
Core Insights - Boston Dynamics' Spot robot can lift a 15 kg tire in just 3.7 seconds, showcasing advanced dynamic whole-body manipulation techniques [1][11] - The robot's performance exceeds traditional static assumptions, demonstrating the ability to coordinate movements effectively beyond its maximum lifting capacity [13] Group 1: Dynamic Whole-Body Manipulation - The method combines sampling and learning to enable the robot to perform tasks requiring coordination of arms, legs, and torso [1][2] - A hierarchical control approach divides the control problem into two layers: low-level control for balance and stability, and high-level control for task-specific strategies [2][14] Group 2: Control Strategies - The low-level control uses reinforcement learning to manage motor torque for stability, while high-level control employs sampling-based strategies for tasks like tire alignment and stacking [2][7] - The sampling controller simulates multiple future scenarios in parallel to identify the most effective actions for task completion [3][5] Group 3: Performance Metrics - The robot achieved an average time of 5.9 seconds per tire, nearly matching human operational speed [11] - The dynamic coordination allows the robot to handle weights significantly exceeding its peak lifting capabilities, expanding its operational range [13][14] Group 4: Learning and Adaptation - The training process incorporates randomization of object properties to bridge the gap between simulation and real-world application [10] - The use of an asymmetric actor-critic architecture for training enhances the robot's ability to adapt to complex dynamics and contact mechanics [8][10]
波士顿动力狗gogo回来了!“五条腿”协同发力
量子位· 2025-10-15 10:20
Core Insights - The article discusses the advancements in Boston Dynamics' Spot robot, which can lift and manipulate a tire weighing 15 kg in just 3.7 seconds, showcasing its dynamic whole-body manipulation capabilities [3][31]. Group 1: Dynamic Whole-Body Manipulation - The method combines sampling and learning for dynamic whole-body manipulation, utilizing reinforcement learning and sampling-based control to enable coordinated tasks involving arms, legs, and torso [11][12]. - A hierarchical control approach is employed, dividing control problems into two complementary layers: a low layer for direct motor torque control and a high layer for task-specific strategies [12][13]. Group 2: Task Execution and Control Strategies - For tasks like tire alignment and stacking, the system uses sampling-based control to simulate potential future scenarios and discover optimal strategies [14]. - Reinforcement learning is applied to maintain stability during rolling tasks, capturing the necessary dynamic features and reactive control mechanisms [15][26]. Group 3: Performance and Efficiency - The Spot robot's performance in tire manipulation exceeds traditional static assumptions, demonstrating the ability to handle weights beyond its peak lifting capacity of 11 kg [35]. - The robot's dynamic coordination of movements allows it to efficiently perform tasks that were previously limited to slower, static methods [36][33]. Group 4: Simplification of Control Problems - Separating high-level and low-level control significantly simplifies the control challenges, allowing the high-level controller to focus on task completion without needing to reason about joint torques or stability constraints [37][38]. - The learned motion abstractions enable the high-level controller to operate in a simplified action space, enhancing computational feasibility and task execution efficiency [38].