上海交大具身导航中的感知智能、社会智能和运动智能全面综述

Core Insights - The article presents the TOFRA framework, which decomposes the embodied navigation process into five key stages: Transition, Observation, Fusion, Reward-policy construction, and Action execution, providing a structured analysis for embodied navigation research [2][14] - It systematically integrates research findings from computer vision, classical robotics, and bionics in the context of embodied navigation, highlighting the complementary nature of these fields in sensing intelligence, social intelligence, and motion intelligence [2][3] - The article identifies four core challenges in the field of embodied navigation: adaptive spatiotemporal scale, joint optimization, system integrity, and data task generalization, guiding future research directions [2][3] Group 1: Research Background - Embodied Artificial Intelligence (EAI) emphasizes self-perception and interaction with humans or the environment as a pathway to Artificial General Intelligence (AGI) [2] - The core feature of embodied navigation is its egocentric perception and distributed computing capabilities, contrasting with traditional navigation methods that rely on predefined maps or external localization [2][3] Group 2: Intelligence Types - Sensing Intelligence: Achieved through multimodal self-centered perception, allowing for spatial cognition without complete reliance on pre-built global maps [3][4] - Social Intelligence: Enables understanding of high-level semantic instructions from humans, supporting complex task execution beyond predefined waypoints [10][11] - Motion Intelligence: Involves the ability to perform flexible and adaptive physical interactions in complex environments, not limited to fixed paths [10][11] Group 3: TOFRA Framework - Transition (T): Involves predicting the next state using internal sensors and various methods, including dynamics modeling and end-to-end neural networks [14][20] - Observation (O): Focuses on how robots perceive the environment through external sensors, forming an understanding of the external world [27][28] - Fusion (F): Combines internal state predictions with external perceptions to achieve optimal state estimation using classical Bayesian methods and neural networks [45][48] Group 4: Action Execution - Action execution involves the robot utilizing motion skills to complete the action sequences generated by the policy, including basic skills and complex skill combinations [60][61] - The article discusses the evolution of action execution from basic motion skills to complex combinations and morphological cooperation, highlighting the advancements in motion intelligence [60][68] Group 5: Application Scenarios - The TOFRA framework is applied to three typical navigation scenarios: embodied autonomous driving, indoor navigation, and complex terrain navigation, detailing how to integrate the framework's stages for efficient navigation systems [74][75][76]