AAAI 2026 Oral | 华科&小米提出具身智能新范式：教机器人「时间管理」

Core Viewpoint - The article discusses the integration of operations research (OR) into embodied AI for improved task execution efficiency, highlighting the development of a new dataset (ORS3D-60K) and a model (GRANT) that enhances robots' ability to perform parallel tasks, achieving a 30.53% increase in efficiency [2][22]. Group 1: Pain Points - Current embodied AI systems often execute tasks sequentially, lacking the ability to recognize which tasks can be performed in parallel, leading to inefficiencies [3][5]. - The inability to utilize operations research knowledge results in robots not being able to optimize task scheduling in complex 3D environments [5][6]. Group 2: Contributions - The ORS3D-60K dataset is introduced, comprising 4,376 real indoor scenes and 60,825 complex tasks, with an average instruction length of 311 words, significantly more complex than previous datasets [12][13]. - Each task in the dataset has been validated by an operations research solver, distinguishing between parallelizable and non-parallelizable tasks, thus enabling optimal scheduling [13][22]. Group 3: Methodology - The GRANT model is proposed, which includes a scheduling token mechanism that allows the model to predict task attributes and utilize an external optimization solver for efficient scheduling [16][19]. - GRANT combines a 3D scene encoder, a large language model (LLM), a scheduling token mechanism, and a 3D grounding head to achieve optimal task execution [19]. Group 4: Experimental Results - Experiments on the ORS3D-60K dataset show that GRANT achieves state-of-the-art performance, with a 30.53% increase in task completion efficiency and a 1.38% improvement in 3D grounding accuracy [18][21]. - The model effectively utilizes waiting periods in tasks to perform other actions, reducing total task time from 74 minutes to 45 minutes, demonstrating a 39% efficiency improvement [21]. Group 5: Summary and Outlook - The research indicates a shift in embodied AI from basic semantic understanding to advanced operational decision-making, with the potential for real-world applications in robotics [22]. - The framework aims to bridge the gap between multimodal large models and optimization solvers, paving the way for robots that can manage time effectively in daily tasks [22].