Workflow
机器人需求驱动导航新SOTA,成功率提升15%!浙大&vivo联手打造
具身智能之心·2025-07-22 06:29

Core Viewpoint - The article discusses the advancements in embodied intelligence, specifically focusing on the new framework CogDDN developed by a research team from Zhejiang University and vivo AI Lab, which enables robots to understand human needs and navigate complex environments autonomously [2][3][6]. Research Motivation - The increasing integration of mobile robots into daily life necessitates their ability to understand human needs rather than just executing commands. For instance, a robot should autonomously seek food when a person expresses hunger [6]. - Traditional navigation methods often struggle in unfamiliar environments due to their reliance on extensive data training, prompting the need for a more generalizable approach that mimics human reasoning [7]. Framework Overview - The CogDDN framework is based on the dual-process theory from psychology, combining heuristic (System 1) and analytical (System 2) decision-making processes to enhance navigation capabilities [9][10]. - The framework consists of three main components: a 3D perception module, a demand matching module, and a dual-process decision-making module [13]. 3D Robot Perception Module - The team utilized the UniMODE method for single-view 3D object detection, improving the robot's ability to navigate indoor environments without relying on multiple views or depth sensors [15]. Demand Matching Module - This module aligns human needs with object characteristics, using supervised fine-tuning techniques to enhance the accuracy of large language models (LLMs) in matching user requests with suitable objects [16]. Dual-Process Decision Making - The heuristic process allows for quick, intuitive decisions based on past experiences, while the analytical process focuses on error reflection and strategy optimization [18][23]. - The Explore and Exploit modules within the heuristic process enable the system to adapt to new environments and efficiently achieve navigation goals [19][20]. Experimental Results - The performance of CogDDN was evaluated using the AI2Thor simulator and the ProcTHOR dataset, demonstrating a significant improvement over existing state-of-the-art methods, with a navigation success rate (NSR) of 38.3% and a success rate in unseen scenes of 34.5% [26][27]. - The removal of key components like the Exploit module and the chain of thought (CoT) significantly decreased system performance, highlighting their importance in decision-making [29][30]. Conclusion - CogDDN represents a cognitive-driven navigation system that continuously learns, adapts, and optimizes its strategies, effectively simulating human-like reasoning in robots [33][34]. - Its dual-process capability enhances performance in demand-driven navigation tasks, laying a solid foundation for the advancement of intelligent robotic technologies [35].