大模型Agent

Search documents
 拜拜了GUI,中科院团队“LLM友好”计算机使用接口来了
 3 6 Ke· 2025-10-27 07:31
大模型Agent帮你自动操作电脑,理想很丰满,现实却骨感。 现有的LLM智能体,几乎都绕不开两大核心"痛点": 成功率低:稍微复杂一点的任务,Agent就"翻车",常常卡在某个步骤不知所措。 效率差:完成一个简单任务,Agent需要和系统进行几十轮"极限拉扯",耗时漫长,看得人着急。 问题到底出在哪?难道是现在的大模型还不够聪明吗? 来自中国科学院软件研究所团队的最新研究给出了一个出乎意料的答案:真正的瓶颈,在于那个我们用了40多年、无比熟悉的图形用户界面(GUI) 。 例如,GUI功能控件藏在层层菜单、选项卡和对话框后面,控件的访问需要点击菜单、下拉框等进行导航,以使控件出现在屏幕上。其次,许多控件的使 用(如滚动条、文本选取)需要反复调整并观察反馈,形成高频"观察-操作"循环。 研究团队一针见血地指出,GUI的这种命令式(Imperative)设计背后,隐藏着对人类用户的四个"关键假设" : 将"命令式"GUI转换为"声明式" 没错,就是那个从上世纪80年代开始流行,彻底改变了人机交互方式的GUI。它一直以来都是为人类量身定制的,其设计哲学与LLM的能力模型,简直是 背道而驰。 研究团队指出了GUI的核心 ...
 拜拜了GUI!中科院团队“LLM友好”计算机使用接口来了
 量子位· 2025-10-27 05:37
 Core Viewpoint - The article discusses the limitations of current LLM agents in automating computer operations, attributing the main bottleneck to the traditional command-based graphical user interface (GUI) that has been in use for over 40 years [2][4].   Group 1: Issues with Current LLM Agents - Current LLM agents face two major pain points: low success rates and inefficiency when handling complex tasks [7]. - The command-based design of GUIs requires LLMs to perform both strategic planning and detailed operational tasks, leading to inefficiencies and increased cognitive load [6][9]. - Human users excel in visual recognition and quick decision-making, while LLMs struggle with visual information and have slower response times [8].   Group 2: Proposed Solution - Declarative Interfaces - The research team proposes a shift from command-based to declarative interfaces (GOI), allowing LLMs to focus on high-level task planning while automating the underlying navigation and interaction [10][12]. - GOI separates the strategy (what to do) from the mechanism (how to do it), enabling LLMs to issue simple declarative commands [14][15]. - The implementation of GOI involves two phases: offline modeling to create a UI navigation graph and online execution using a simplified interface [16][19].   Group 3: Experimental Results - The introduction of GOI significantly improved performance, with success rates increasing from 44% to 74% when using the GPT-5 model [21]. - Failure analysis showed that after implementing GOI, 81% of failures were due to strategic errors rather than mechanism errors, indicating a successful reduction in low-level operational mistakes [24][25].   Group 4: Future Implications - The research suggests that GOI provides a clear direction for designing interaction paradigms that are more suitable for large models [27]. - It raises the question of whether future operating systems and applications should natively offer LLM-friendly declarative interfaces to facilitate the development of more powerful and versatile AI agents [28].


