系统级GUI Agent
Search documents
「豆包手机」为何能靠超级Agent火遍全网,我们听听AI学者们怎么说
3 6 Ke· 2025-12-10 09:39
手机上的 AI,从来没有这么像真人。 最近一个星期,席卷科技圈的一款手机不来自任何一家硬件大厂,而是与字节的豆包联系在了一起。 这款搭载豆包手机助手的工程机引爆了全网,让很多人第一次真切地感受到 Agent 已经触手可及。在某宝平台上,这款手机的价格被炒到了近五千元。 本月初发布的豆包手机助手,目前还是技术预览版。与大多数作为独立 App 存在的 AI 助手都不一样的是,它通过把 AI Agent 嵌入系统底层的方式,让 手机实现了端侧 AI 能力的全面突破,带来了全新的交互方式和多模态体验。在不少科技从业者看来,豆包手机助手已经把 AI 工具的认知推向了新的高 度,它不再只是一个辅助工具或外置 App,而是与手机操作系统深度绑定的「超级管家」。 毕竟,只需要一句话,豆包手机助手可以真正地实现跨 App 的复杂指令执行。除了其他手机上 Agent 常见的订餐、记账、修改设置等能力之外,豆包手 机助手能够攻克相对模糊且复杂的长链条需求。 豆包手机助手全程无中断地完成「地图上标记餐厅、查找博物馆以及旅行平台订票」的多需求、长链路任务。 这样的表现让人直呼:「是不是有点过于智能化了」。 与此同时,围绕豆包手机助手持 ...
「豆包手机」为何能靠超级Agent火遍全网,我们听听AI学者们怎么说
机器之心· 2025-12-10 08:13
Core Viewpoint - The article discusses the emergence of the Doubao mobile assistant, which integrates AI capabilities deeply into the smartphone operating system, transforming the way users interact with their devices and enabling complex task execution across multiple applications [3][12][26]. Group 1: Doubao Mobile Assistant Overview - The Doubao mobile assistant is currently in a technical preview phase and represents a significant advancement in AI integration within smartphones, functioning as a "super butler" rather than a standalone app [3][6]. - It allows users to execute complex commands across different apps with simple voice instructions, showcasing a new level of AI interaction [3][12]. - The assistant can perform multi-step tasks seamlessly, such as marking restaurants on a map, finding museums, and booking tickets on travel platforms [5][12]. Group 2: Challenges in Implementing System-Level AI Agents - Implementing system-level AI agents like Doubao involves overcoming four main challenges: perception, planning, decision-making, and system-level integration [9][10]. - The perception layer requires the agent to recognize all interactive elements on the screen quickly and accurately, even amidst dynamic distractions [9]. - The planning layer involves managing information flow across apps, maintaining logical continuity, and adapting to unexpected interruptions [10]. - The decision-making layer necessitates the agent's ability to generalize across different interfaces and execute various user interactions beyond simple clicks [10]. Group 3: Technical Innovations Behind Doubao - Doubao leverages a system-level integration approach, gaining Android system-level permissions while ensuring user privacy through strict authorization protocols [12][13]. - The assistant utilizes a visual multi-modal capability to understand screen content and user intent, allowing it to autonomously decide the next actions [12][13]. - The underlying technology, UI-TARS, is a proprietary engine developed by ByteDance, which enhances the assistant's performance and capabilities [16][24]. Group 4: Future Implications and Industry Perspectives - The evolution of AI capabilities in smartphones is expected to shift the interaction paradigm from "users seeking services" to "services seeking users," leading to a more intuitive user experience [26][27]. - Experts believe that system-level GUI agents will become standard features in future mobile operating systems, enhancing the autonomy and intelligence of smartphones [26][27]. - Despite the promising advancements, challenges such as computational power, coordination of system-level agents, and security mechanisms remain to be addressed [27].
起底“豆包手机”:核心技术探索早已开源,GUI Agent布局近两年,“全球首款真正的AI手机”
3 6 Ke· 2025-12-09 08:57
3万台首批备货被一抢而空、在二手市场价格翻番的当红炸子鸡"豆包手机",更多技术详情得到证实。 事实证明,豆包手机助手技术预览版背后,是字节在"系统级GUI Agent"赛道上布局了近两年的大棋。 在官方演示中,搭载在工程样机nubia M153上的它,能代替用户操作手机,跨应用自动化执行任务。 比如一次性下达多个指令,让它一口气完成在飞书上代为请假、提交差旅申请、预订出差高铁票等复杂任务: 而据量子位最新打听到的消息,这套图形界面操作能力,正是建立在字节自研的UI-TARS模型基础之上。 开发者对此系列模型应该并不陌生。初代一经开源便引发热议,被评价性能优于当时曝光的OpenAI Operator(UI-TARS在Operator正式发布前就已发 布)。 PS:关键后来正式发布的Operator,还要开200美元一个月的Pro会员才能用…… UI-TARS模型的持续进化与应用 早在今年1月,字节Seed团队与清华联手开源初代UI-TARS,为系统级AI Agent奠定基础。此后,团队便沿着这条路线持续深耕,不断迭代打磨能力。 团队指出,原生Agent需具备感知、动作、推理、记忆四大核心能力。 "豆包手机"使用 ...
起底“豆包手机”:核心技术探索早已开源,GUI Agent布局近两年,“全球首款真正的AI手机”
量子位· 2025-12-09 07:37
Core Insights - The article discusses the rapid success and technological foundation of the "Doubao Phone" and its assistant, which has gained significant attention in the market due to its advanced capabilities in automating tasks on mobile devices [1][50]. Group 1: Product Overview - The "Doubao Phone" sold out its initial stock of 30,000 units, with prices in the second-hand market doubling [1]. - The phone's assistant can automate complex tasks across applications, such as submitting leave requests and booking train tickets [4][5]. - The assistant is built on ByteDance's self-developed UI-TARS model, which has been optimized for mobile use [7][8]. Group 2: Technological Development - The UI-TARS model has undergone significant iterations, with the initial version released in January 2023, followed by UI-TARS-1.5 and the latest UI-TARS-2, which enhances the agent's capabilities [11][23][34]. - UI-TARS-2 addresses issues related to data scalability and multi-round reinforcement learning, allowing for more autonomous interactions with graphical user interfaces [34][35]. - The model has shown superior performance in various benchmarks compared to competitors like OpenAI's models [27][28]. Group 3: User Experience and Feedback - Users have reported high satisfaction with the assistant's ability to perform tasks efficiently, with one user describing it as the "world's first true AI smartphone" [69]. - The assistant's design includes a dual-mode system, allowing for both rapid responses and deeper reasoning capabilities [60][62]. - Concerns regarding privacy and security have been raised, but the company has emphasized that user consent is required for high-level permissions [50][51]. Group 4: Market Implications - The success of the "Doubao Phone" indicates a shift towards AI-driven mobile technology, where devices can autonomously understand and execute user intentions [85]. - The product's development reflects a broader trend in the industry towards integrating advanced AI capabilities into everyday technology, potentially redefining user interaction with mobile devices [86].