Workflow
Visual - Language - Action (VLA) Model
icon
Search documents
Figure机器人,可以自主干家务了
财联社· 2026-03-11 06:20
Core Viewpoint - Figure is advancing the development of humanoid robots capable of performing household tasks autonomously, utilizing its new AI system Helix 02 to enhance object recognition and manipulation skills [3][10]. Group 1: Robot Capabilities - The latest demonstration video showcases the Figure 03 robot's ability to autonomously organize a living room, highlighting improvements in object recognition and hand manipulation [3][6]. - The robot can perform various tasks such as spraying disinfectant, cleaning surfaces, and organizing items, demonstrating its capability to differentiate between various objects [6][9]. Group 2: Challenges in Household Robotics - Performing household chores is significantly more complex for robots compared to standardized factory tasks due to the unpredictable nature of home environments, including cluttered spaces and the need for dual-hand operations [9]. - The variability of household items, such as soft towels and cushions, adds to the complexity of tasks that robots must navigate [9]. Group 3: AI System and Learning Methodology - Figure employs a unique approach by using a single neural network to control the robot's movements based on visual input, allowing it to learn from examples without needing to be reprogrammed for each task [10]. - Helix 02, the latest AI system, serves as the "intelligent brain" of the robot, enabling it to understand and execute complex tasks through a visual-language-action model [10]. Group 4: Data Collection Initiatives - Figure is undertaking a significant project, "Project Go-Big," aimed at creating the world's largest pre-training dataset for humanoid robots, collaborating with Brookfield, which manages over 100,000 residential units, to expedite this process [11].
突破VLA模型推理瓶颈!GigaAI、中科院自动化所和清华大学联合推出最新VLA-R1大模型,真实场景执行成功率75%
机器人大讲堂· 2025-11-04 09:07
Core Insights - The article discusses the significance of Visual-Language-Action (VLA) models in embodied artificial intelligence, highlighting their ability to generalize across tasks and environments for robot interaction with the real world [1][3]. VLA Model Challenges - Existing VLA models face two main challenges: a lack of step-by-step reasoning, leading to failures in instruction disambiguation, and insufficient systematic reinforcement of reasoning post-training [2]. Introduction of VLA-R1 - VLA-R1 is a newly proposed reasoning-enhanced VLA model developed by GigaAI, CASIA, and Tsinghua University, which aims to bridge the gap between reasoning and execution through a structured framework [3]. VLA-CoT-13K Dataset - The research team created the VLA-CoT-13K dataset, consisting of 13,000 labeled data points that provide clear "thinking chains" for each task, detailing the reasoning process leading to action plans [5][7]. Reinforcement Learning Strategy - VLA-R1 employs a post-training strategy called "verifiable reward-based reinforcement learning," utilizing a "group relative policy optimization" algorithm to enhance training efficiency [9]. Reward Signals in Training - The model incorporates three verifiable reward signals: - Area alignment reward focuses on the accuracy of predicted operation areas [12]. - Trajectory consistency reward evaluates the smoothness and reasonableness of generated action trajectories [12]. - Output format reward ensures structured and clear output, promoting a "think before act" approach [12][13]. Performance Evaluation - VLA-R1 demonstrated impressive performance in various tests, achieving an IoU of 36.51 in in-domain tasks, a 17.78% improvement over the best baseline model, and maintaining strong performance in out-of-domain scenarios [14][15]. Robustness in Simulation - In simulated environments, VLA-R1 achieved an average success rate of 55% in affordance perception tasks and 70% in trajectory execution tasks across different robot models [17]. Real-World Application - In real-world evaluations, VLA-R1 achieved an average success rate of 62.5% in affordance perception and 75% in trajectory execution across challenging scenarios [19]. Future Directions - Future research will focus on expanding the adaptability of the model to more complex robotic platforms and optimizing the reward mechanism to enhance safety and robustness in real-world applications [20].