AAAI'26 Oral | 华科&小米提出新范式:教机器人「时间管理」,任务效率提升30%以上!
具身智能之心·2025-11-26 10:00

Core Viewpoint - The article discusses the introduction of operations research (OR) knowledge into embodied AI for task planning, highlighting the development of a new dataset (ORS3D-60K) and a model (GRANT) that significantly improves task execution efficiency by 30.53% [2][22]. Group 1: Pain Points - Current embodied AI systems struggle with task planning as they often assume tasks must be completed sequentially, lacking the ability to recognize parallelizable tasks [3][6]. - The inability to utilize OR knowledge limits robots from efficiently managing time and resources in complex 3D environments [6][8]. Group 2: Contributions - The ORS3D-60K dataset is introduced, comprising 4,376 real indoor scenes and 60,825 complex tasks, with an average instruction length of 311 words, significantly more complex than previous datasets [10][12]. - Each task in the dataset has been validated by an OR solver, distinguishing between tasks that require continuous attention and those that can run in the background, thus enabling optimal scheduling [12][22]. Group 3: Methodology - The GRANT model is proposed, which integrates a scheduling token mechanism (STM) to enhance the capabilities of existing multimodal models by allowing them to predict task attributes and utilize an external optimization solver for efficient scheduling [16][19]. - GRANT's architecture includes a 3D scene encoder, a large language model (LLM), the STM, and a 3D localization head, effectively combining language understanding with time management [19][22]. Group 4: Experimental Results - Experiments on the ORS3D-60K dataset show that GRANT achieves state-of-the-art performance, with a 30.53% increase in task completion efficiency and a 1.38% improvement in 3D grounding accuracy [18][21]. - The model effectively utilizes waiting periods in tasks to parallelize operations, reducing total task time from 74 minutes to 45 minutes, demonstrating a 39% efficiency improvement [21]. Group 5: Summary and Outlook - This research marks a shift in embodied AI from basic semantic understanding to advanced operational decision-making, aiming to create intelligent agents capable of efficient time management in real-world applications [22].