反思机制

Search documents
端到端GUI智能体首次实现“犯错-反思-修正”闭环,模拟人类认知全过程
量子位· 2025-06-11 08:07
Core Viewpoint - The article discusses the introduction of a new framework called GUI-Reflection by the MMLab team at Nanyang Technological University, which endows end-to-end multimodal GUI agents with self-reflection and error-correction capabilities, addressing the limitations of current training paradigms in automation tasks on devices like smartphones and computers [1]. Group 1: GUI-Reflection Framework Overview - GUI-Reflection is a comprehensive framework designed to systematically impart self-reflection and error-correction abilities to multimodal GUI agents, consisting of three key stages: cognitive inspiration, behavior acquisition, and interactive reinforcement [6][27]. - The framework introduces the GUI-Reflection Task Suite during the pre-training phase, which focuses on enabling the model to engage with reflection-related tasks, laying the groundwork for subsequent training stages [2][7]. Group 2: Offline Supervised Fine-Tuning - An automated data pipeline is constructed to generate behavior data that incorporates reflection and error-correction from existing flawless trajectories, allowing the model to learn reflective behaviors effectively [3][8]. - The data generation process includes creating erroneous behaviors by modifying original task goals and inserting invalid operations into successful trajectories, enabling the model to reflect on mistakes and attempt new correct actions [9][10]. Group 3: Online Training Phase - A distributed mobile GUI learning environment is established, featuring 11 apps and 215 task templates, which supports high-concurrency interactions, enhancing the model's adaptability in real-world scenarios [12]. - An automated iterative online reflection tuning algorithm is designed to optimize the model's fault tolerance, recovery ability, and complex planning skills through multiple training iterations and dynamic sampling strategies [12]. Group 4: Experimental Results - The introduction of reflection-oriented task data during the pre-training phase significantly improves the model's performance in reflection-related tasks, even for smaller models, achieving results comparable to closed-source large models [16]. - The GUI-Reflection framework demonstrates a success rate of 34.5% in the AndroidWorld benchmark, validating the effectiveness of explicitly incorporating reflection mechanisms across multiple training stages [19][20]. Group 5: Conclusion - GUI-Reflection injects a novel self-reflection capability into end-to-end multimodal GUI agents, creating a cognitive loop of "error-reflection-correction" that enhances the model's robustness and flexibility in dealing with uncertainties in real-world environments [27].