Workflow
大模型Agent
icon
Search documents
拜拜了GUI,中科院团队“LLM友好”计算机使用接口来了
3 6 Ke· 2025-10-27 07:31
Core Insights - The current limitations of LLM agents stem from the traditional command-based GUI, which creates inefficiencies and low success rates in task execution [1][3][4] Group 1: Issues with Current GUI - The command-based GUI requires users to navigate through multiple menus and options, making it difficult for LLMs to access application functionalities directly [3][4] - LLMs face challenges in visual recognition and slow response times, which are incompatible with the command-based design of GUIs [4][5] - The cognitive load on LLMs is high as they must manage both strategic planning and detailed operational tasks, leading to increased error rates [4][9] Group 2: Introduction of Declarative Interfaces - The research proposes a shift from command-based to declarative interfaces (GOI), allowing LLMs to focus on high-level task planning while automating the underlying navigation and interaction [4][9][10] - GOI separates strategy from mechanism, enabling LLMs to issue high-level commands without needing to manage the intricate details of GUI navigation [7][9] Group 3: Implementation and Results - GOI operates in two phases: offline modeling to create a UI navigation graph and online execution using simplified declarative commands [12][13] - Experimental results show a significant increase in success rates, with LLMs achieving a success rate of 74% compared to 44% previously, and over 61% of tasks completed in a single call [15][16] - The introduction of GOI shifted the failure rate from mechanism-related errors to strategy-related errors, indicating a successful reduction in low-level operational mistakes [18][20] Group 4: Future Implications - The development of GOI suggests a need for future operating systems and applications to incorporate LLM-friendly declarative interfaces, paving the way for more powerful AI agents [20]
拜拜了GUI!中科院团队“LLM友好”计算机使用接口来了
量子位· 2025-10-27 05:37
Core Viewpoint - The article discusses the limitations of current LLM agents in automating computer operations, attributing the main bottleneck to the traditional command-based graphical user interface (GUI) that has been in use for over 40 years [2][4]. Group 1: Issues with Current LLM Agents - Current LLM agents face two major pain points: low success rates and inefficiency when handling complex tasks [7]. - The command-based design of GUIs requires LLMs to perform both strategic planning and detailed operational tasks, leading to inefficiencies and increased cognitive load [6][9]. - Human users excel in visual recognition and quick decision-making, while LLMs struggle with visual information and have slower response times [8]. Group 2: Proposed Solution - Declarative Interfaces - The research team proposes a shift from command-based to declarative interfaces (GOI), allowing LLMs to focus on high-level task planning while automating the underlying navigation and interaction [10][12]. - GOI separates the strategy (what to do) from the mechanism (how to do it), enabling LLMs to issue simple declarative commands [14][15]. - The implementation of GOI involves two phases: offline modeling to create a UI navigation graph and online execution using a simplified interface [16][19]. Group 3: Experimental Results - The introduction of GOI significantly improved performance, with success rates increasing from 44% to 74% when using the GPT-5 model [21]. - Failure analysis showed that after implementing GOI, 81% of failures were due to strategic errors rather than mechanism errors, indicating a successful reduction in low-level operational mistakes [24][25]. Group 4: Future Implications - The research suggests that GOI provides a clear direction for designing interaction paradigms that are more suitable for large models [27]. - It raises the question of whether future operating systems and applications should natively offer LLM-friendly declarative interfaces to facilitate the development of more powerful and versatile AI agents [28].