Workflow
CogDDN
icon
Search documents
机器人需求驱动导航新SOTA,成功率提升15%!浙大&vivo联手打造
具身智能之心· 2025-07-22 06:29
Core Viewpoint - The article discusses the advancements in embodied intelligence, specifically focusing on the new framework CogDDN developed by a research team from Zhejiang University and vivo AI Lab, which enables robots to understand human needs and navigate complex environments autonomously [2][3][6]. Research Motivation - The increasing integration of mobile robots into daily life necessitates their ability to understand human needs rather than just executing commands. For instance, a robot should autonomously seek food when a person expresses hunger [6]. - Traditional navigation methods often struggle in unfamiliar environments due to their reliance on extensive data training, prompting the need for a more generalizable approach that mimics human reasoning [7]. Framework Overview - The CogDDN framework is based on the dual-process theory from psychology, combining heuristic (System 1) and analytical (System 2) decision-making processes to enhance navigation capabilities [9][10]. - The framework consists of three main components: a 3D perception module, a demand matching module, and a dual-process decision-making module [13]. 3D Robot Perception Module - The team utilized the UniMODE method for single-view 3D object detection, improving the robot's ability to navigate indoor environments without relying on multiple views or depth sensors [15]. Demand Matching Module - This module aligns human needs with object characteristics, using supervised fine-tuning techniques to enhance the accuracy of large language models (LLMs) in matching user requests with suitable objects [16]. Dual-Process Decision Making - The heuristic process allows for quick, intuitive decisions based on past experiences, while the analytical process focuses on error reflection and strategy optimization [18][23]. - The Explore and Exploit modules within the heuristic process enable the system to adapt to new environments and efficiently achieve navigation goals [19][20]. Experimental Results - The performance of CogDDN was evaluated using the AI2Thor simulator and the ProcTHOR dataset, demonstrating a significant improvement over existing state-of-the-art methods, with a navigation success rate (NSR) of 38.3% and a success rate in unseen scenes of 34.5% [26][27]. - The removal of key components like the Exploit module and the chain of thought (CoT) significantly decreased system performance, highlighting their importance in decision-making [29][30]. Conclusion - CogDDN represents a cognitive-driven navigation system that continuously learns, adapts, and optimizes its strategies, effectively simulating human-like reasoning in robots [33][34]. - Its dual-process capability enhances performance in demand-driven navigation tasks, laying a solid foundation for the advancement of intelligent robotic technologies [35].
机器人需求驱动导航新SOTA,成功率提升15%!浙大&vivo联手打造
量子位· 2025-07-21 04:23
Core Viewpoint - The research team from Zhejiang University and vivo AI Lab has made significant progress in developing a cognitive-driven navigation framework called CogDDN, which enables robots to understand human intentions and navigate complex environments autonomously [2][5][33]. Research Motivation - As mobile robots become more integrated into daily life, there is a need for them to not only execute commands but also understand human needs, such as seeking food when a person feels hungry [5]. - Traditional demand-driven navigation methods rely heavily on extensive data training and struggle in unfamiliar environments or vague instructions, prompting the exploration of more generalizable navigation methods [6]. Framework Overview - The CogDDN framework is based on the dual-process theory from psychology, combining heuristic (System 1) and analytical (System 2) decision-making processes to simulate human-like reasoning in navigation tasks [8][20]. - The framework consists of three main components: a 3D robot perception module, a demand matching module, and a dual-process decision-making module [13]. 3D Robot Perception Module - The team utilized the state-of-the-art single-view 3D detection method, UniMODE, to enhance the robot's three-dimensional perception capabilities in indoor navigation [15]. Demand Matching Module - The demand matching module aligns objects with human needs based on shared characteristics, employing supervised fine-tuning techniques to improve the accuracy of recommendations in complex scenarios [16]. Dual-Process Decision Making - The heuristic process allows for quick, intuitive decision-making, while the analytical process focuses on error reflection and strategy optimization [9][23]. - The heuristic process includes two sub-modules: Explore, which generates exploratory actions to scan the environment, and Exploit, which focuses on precise actions to achieve navigation goals [19]. Experimental Results - In closed-loop navigation experiments using the AI2-THOR simulator, CogDDN outperformed existing state-of-the-art methods, achieving a navigation success rate (NSR) of 38.3% and a success rate for weighted path length (SPL) of 17.2% [26][27]. - The framework demonstrated superior adaptability and efficiency in unseen scenes compared to methods that rely solely on forward-facing camera inputs [28]. Continuous Learning and Adaptation - The analysis process in CogDDN allows for iterative learning, where the system reflects on obstacles encountered during navigation and integrates this knowledge into its decision-making framework [24][31]. - The reflection mechanism significantly enhances the system's performance in future navigation tasks, showcasing its robust learning capabilities [32]. Conclusion - CogDDN represents a significant advancement in cognitive-driven navigation systems, enabling robots to efficiently adapt and optimize their strategies in complex environments [33][34]. - The dual-process capability of CogDDN lays a solid foundation for the development of intelligent robotic technologies in demand-driven navigation tasks [35].