GUI Agent
Search documents
从豆包手机谈起:端侧智能的愿景与路线图
AI前线· 2025-12-22 05:01
近日,字节跳动发布的豆包手机助手在业内激起了广泛讨论。这不仅是一款新智能硬件的亮相,更标志着大模型应用范式的一次重要跃迁——从"Chat (对话)"真正迈向"Action(行动)"。作为长期深耕大模型领域的研究者,我们将豆包手机助手定义为行业首款系统级 GUI Agent。它不再是一个孤立 的智能应用,而是深度耦合于操作系统底层、具备跨应用感知与操作能力的"超级中枢"。 如何看待豆包手机助手的当下与未来?藉此机会,我们希望与大家分享我们眼中的手机助手,以及端侧智能的演进愿景与路线图。 豆包手机助手关键技术解析与研判 GUI Agent 无疑是豆包手机助手的核心技术。为透视豆包手机助手的技术本质,我们有必要先回顾 GUI Agent 技术从实验室走向产业界的演进脉络。 2023 年至 2025 年间,GUI Agent 技术经历了从"外挂式框架"到"模型原生智能体"的根本性范式转变: 作者 | OpenBMB 团队 而在模型侧 ,综合现有使用体验与技术特征判断,豆包手机助手采用端云协同的模型架构: 豆包手机助手的出现,标志着 GUI Agent 终于走出实验室的"玩具"阶段,开始具备实用价值。它揭示了一个事实: ...
豆包手机引发的思考:AgentVS超级App,AI公司VS手机厂商
新财富· 2025-12-16 08:22
Core Viewpoint - The launch of the Doubao mobile assistant by ByteDance represents a significant step towards the realization of system-level AI agents, challenging the dominance of super apps like WeChat and Alipay in the mobile ecosystem [2][14][27] Group 1: Doubao Mobile Assistant Launch - On December 1, ByteDance's Doubao team released a technical preview of the Doubao mobile assistant, which collaborates deeply with phone manufacturers at the operating system level to enable cross-application automation [2] - The initial batch of 30,000 units sold out instantly, but within two days, major super apps like WeChat, Alipay, Taobao, and Meituan blocked the Doubao mobile assistant [3] Group 2: AI Agent Development - The Doubao mobile assistant demonstrates the feasibility of GUI agents, completing a closed-loop attempt for AI phones, but raises questions about its practical utility in real-world scenarios [5] - The evolution of AI agents has transitioned from fixed scripts and rule engines to a stage where GUI intelligent agents can understand and operate across applications, as seen with advancements from companies like Microsoft and Anthropic [6][7] Group 3: System-Level Agent vs. Super Apps - The system-level agent can understand user intent and orchestrate multiple apps, moving the focus from an app-centric model to a user-intent-centric model [8][10] - The core advantages of system-level agents include the ability to organize tasks across multiple apps and theoretical platform neutrality, alleviating long-standing issues like fragmented cross-app processes [11][12] Group 4: Industry Dynamics and Conflicts - The emergence of the Doubao mobile assistant has highlighted the conflict between system-level agents and super apps, with super apps responding defensively to protect their user entry points [14][15] - The long-term outcome may not be the elimination of one model over the other, but rather a redefinition of power boundaries and responsibilities between system-level agents and super apps [17] Group 5: Manufacturer Strategies - Different manufacturers are adopting varied strategies regarding AI agents, with Huawei integrating agents into its operating system, Xiaomi focusing on ecosystem integration, and Apple maintaining a single official agent [19][23][24] - The competitive landscape suggests a future where multiple agents coexist in the Android ecosystem, while iOS maintains a clearer structure with one official agent and several plugins [24][25]
豆包手机触碰了大厂APP的“逆鳞”
3 6 Ke· 2025-12-15 23:28
豆包手机短暂闪耀背后,千问元宝们还能否成为超级AI入口? 国内互联网生态圈,只给了豆包AI手机一天半的Showtime。 期间豆包手机可以通过接入系统的手机Agent(智能体)实现相对复杂的跨应用操作,如对比外卖价格并下单、给老板的朋友圈点 赞并留言、完成B站的"大考"等。 整个流程除了支付等重要动作需要人来操作,其余都可以让AI挂在后台自动执行,基本满足了大众对于手机Agent的初步想象。 但这一技术触及了互联网大厂们的"逆鳞",一圈数字世界的"高墙"很快拦到了豆包手机用户们的面前。 手机发布的次日晚间,就有用户反映"无法登录微信",显示为"登录环境存在异常",需要更换设备。之后在淘宝、闲鱼、大麦等阿 里系应用中,频繁出现闪退或强制登出;农行、建行等银行App则以"风险环境"为由中止登录和支付。 目前豆包手机虽可以正常使用微信、支付宝等App,但已经"泯然众机矣"——手机无法再通过内置AI来操作这些App,使用体验相 比之前大打折扣。 面对这场风波,业内人士的回应相对冷淡,甚至带着些"意料之中"的意味。 腾讯回应称:"没有什么特别的动作,可能是中了本来就有的安全风控措施[1]。"手机厂商也有高管出来表态,认 ...
豆包“撕裂”AI手机
投中网· 2025-12-13 06:49
Core Viewpoint - The emergence of the Doubao phone, a collaboration between Doubao and Nubia, has disrupted the AI smartphone market, showcasing advanced capabilities while sparking significant controversy and debate within the industry [6][7]. Group 1: Product Overview - The Doubao phone is technically a preview version of the Nubia M153, featuring deep integration of the Doubao assistant into its operating system, allowing for continuous operations beyond traditional voice assistants [6][7]. - The phone's market price surged from the original 3499 yuan to as high as 36,000 yuan, reflecting a split sentiment between excitement and skepticism among consumers [6]. - It has been praised for its ability to perform complex tasks across multiple applications, such as answering questions on Bilibili and making purchases, leading to comparisons with human-like interaction [6][9]. Group 2: AI Smartphone Landscape - The concept of "AI smartphones" gained traction in the second half of 2023, with major manufacturers like Samsung, Google, and Xiaomi emphasizing AI integration, but many offerings were criticized for lacking true innovation [8][9]. - Doubao's approach is more aggressive, enabling extensive cross-application operations that surpass the capabilities of existing AI smartphones, which are often limited to predefined tasks [9][11]. Group 3: Technical Pathways - The industry is divided into two main technical pathways for AI smartphones: traditional methods relying on SDK interfaces and the newer GUI Agent approach, which allows direct interaction with the screen [9][10]. - Doubao's implementation of GUI Agent technology enables it to autonomously handle tasks without relying on app-specific interfaces, a significant advancement over earlier AI assistants [10][11]. Group 4: Industry Conflict - The Doubao phone's capabilities have led to pushback from major apps like WeChat and Alipay, which have implemented restrictions to prevent automated operations, highlighting a clash between traditional app security measures and new AI functionalities [14][15]. - The core of the conflict lies in differing interpretations of user operation permissions, with apps prioritizing human-like interactions while AI systems advocate for user-authorized automation [15][16]. Group 5: Market Dynamics and Future Outlook - The AI smartphone sector is becoming a battleground for tech companies vying for dominance in the AI era, with the potential to redefine user interaction and data flow [22][23]. - The emergence of the Doubao phone has prompted major tech firms to reassess their AI strategies, leading to a competitive landscape categorized into three tiers based on integration capabilities and market responsiveness [24][25][26].
AI版「互联网协议」面世,豆包手机们再也不怕被「封禁」了?
3 6 Ke· 2025-12-12 08:36
最近封禁「豆包手机」(nubia M53)的 App 名单越拉越长了。不只是微信、支付宝,拼多多、淘宝等电商平台以及更多银行类应用,也都开始不同程度 禁止用户在豆包手机上登录和使用。 这不是简单的产品之争。 一句「帮我比价下单」,手机页面开始自动跳转、识别界面、点击按钮、领券、结算,全程不依赖任何官方接口。豆包手机助手走的是典型的 GUI Agent 路线——让 AI 看懂手机界面,直接模拟用户在 GUI(图形用户界面)上进行操作。 类似的还有被亚马逊严正警告的 Comet AI(知名 AI 搜索初创公司 Perplexity 旗下),尚且还是在相对开放的 Web 世界,而豆包手机助手面对的则是巨头 林立的 App 世界。 Perplexity 对亚马逊的回应,图片来源:Perplexity 关键在于,整个互联网生态都还没有准备好承接 GUI Agent 对系统权限、平台秩序和安全边界的「野蛮冲击」。 相较之下,基于 MCP(Model Context Protocol,大模型上下文协议)的 Agent 模式,虽然也不可能从解决 AI 时代的所有平台矛盾,却给出了一条「通往 共赢之路」。 就在 12 月 10 ...
00后大模型实习生「扒光」豆包手机,千字实测揭秘
3 6 Ke· 2025-12-10 06:50
Core Insights - The "Doubao Phone" has gained immense popularity due to its advanced AI capabilities, allowing users to perform tasks like automatic price comparison, messaging, and travel planning within seconds [1][3][5]. Group 1: Technology and Features - The phone operates on a dual-mode system, distinguishing between a fast, intuitive mode (System 1) and a slower, more reasoning-based mode (System 2) [10][14]. - It utilizes a hybrid perception routing system to handle complex tasks in noisy environments, demonstrating advanced visual understanding [16][19]. - The architecture allows for parallel runtime, enabling the AI to perform tasks in the background without interrupting other phone functions [21]. - The design incorporates a heuristic engineering approach that introduces fixed delays to enhance success rates in task execution [22][25]. - Privacy is addressed through a physical isolation mechanism, ensuring that sensitive information is not recorded or monitored [26][28]. Group 2: Performance and User Experience - The AI exhibits resilience by adapting to failures and dynamically adjusting its approach to task completion [35][39]. - The phone's assistant is capable of executing complex operations, such as document processing and information retrieval, across multiple applications seamlessly [45][51]. - The rapid iteration of the UI-TARS model indicates a commitment to continuous improvement and adaptation to user needs [46]. Group 3: Industry Impact - The emergence of the "Doubao Phone" signifies a shift towards OS-level integration of AI capabilities in mobile devices, potentially redefining the future of smartphones [55][57]. - The phone represents a significant advancement in GUI agents, merging AI with user interface interactions to create a more intuitive user experience [48][58].
徐新成为张一鸣“新股东”,以3.4万亿估值拿下字节跳动部分股权;任正非强调AI重在应用;理想AI眼镜重量仅36g丨AI产业周报
创业邦· 2025-12-07 01:08
Core Insights - The article highlights significant developments in the AI industry, including new product launches, funding rounds, and strategic shifts among major companies [5][34]. Group 1: Company Developments - Midea Group officially announced its humanoid robot strategy, focusing on three categories: humanoid robots, full humanoid robots, and super humanoid robots, aiming for high efficiency and low cost [7]. - Huawei's CEO Ren Zhengfei emphasized the importance of AI applications, contrasting China's focus on practical solutions with the U.S. pursuit of general AI [8]. - Ideal Auto launched its AI glasses, weighing only 36 grams with an 18-hour battery life, showcasing advancements in wearable technology [9]. - The humanoid robot T800 was released by Zhongqing, featuring a height of 1.73m and a weight of 75kg, with a performance cost only one-third of human labor [13]. - JD.com announced that its digital human live streaming service will be free for all merchants, enhancing its e-commerce capabilities [17]. Group 2: Funding and IPOs - Qingwei Intelligent completed over 2 billion RMB in Series C financing, with plans to focus on next-generation reconfigurable chip development and initiate IPO preparations [18]. - Anthropic is preparing for a potential IPO, with a valuation expected to exceed 300 billion USD, indicating strong investor interest [12]. - HeShan Technology announced successful financing rounds totaling several hundred million RMB, with participation from 13 investors [20]. Group 3: AI Technology Advancements - ByteDance released Vidi2, a multimodal large language model for video understanding, capable of processing hours of raw footage and generating complete video segments [19]. - OpenAI is developing a new AI model codenamed "Garlic" to compete with Google's Gemini3, focusing on programming and logical reasoning tasks [29]. - Amazon unveiled its custom AI chip Trainium3, which is four times faster than its predecessor and can reduce AI model training costs by up to 50% [30]. Group 4: Regulatory and Ethical Developments - Eight major e-commerce platforms, including JD.com and Meituan, signed a commitment to regulate AI technology applications, aiming to establish self-regulatory standards [21]. - Doubao Mobile Assistant announced plans to standardize AI operations on mobile devices, including restrictions on certain applications to prevent misuse [9].
保守的谷歌,激进的豆包
3 6 Ke· 2025-12-05 10:23
Core Viewpoint - The development of AI technology must prioritize user rights and regulatory compliance, as exemplified by the contrasting approaches of Google and Doubao in their AI assistant functionalities [11][12]. Group 1: Doubao's AI Assistant - Doubao's AI assistant has faced criticism for its aggressive approach, which bypasses established security protocols of major applications like WeChat and Alipay, raising concerns about user safety and compliance [7][9]. - The company has announced plans to adjust its AI capabilities, particularly in financial applications, to ensure a balance between technological advancement and user safety [4][5]. - Doubao's strategy of directly operating applications without adhering to existing security frameworks has led to backlash from both users and developers, highlighting the risks associated with such innovations [8][10]. Group 2: Industry Context and Comparisons - The global AI Agent landscape is rapidly evolving, with major players like Microsoft, Google, and Amazon investing heavily in AI platforms, each leveraging their unique strengths [6]. - In contrast to Doubao's approach, established AI assistants like Siri and Google Assistant operate within strict API guidelines, ensuring user privacy and data security while avoiding conflicts with application developers [7][9]. - The AHA (Agent Hub Access) solution introduced by Alipay and OPPO exemplifies a more cautious and collaborative approach, focusing on secure and transparent interactions between AI assistants and applications [14][15]. Group 3: Regulatory and Security Considerations - The financial sector imposes stringent regulations on data security and user privacy, making Doubao's bypassing of these protocols particularly contentious [9][10]. - The potential for legal liability and user trust issues arises if Doubao's AI operations lead to security breaches or unauthorized access to sensitive information [8][10]. - The industry's future will depend on establishing standards that respect user privacy and regulatory requirements, as seen in ongoing efforts by Chinese authorities to promote multi-agent interoperability standards [15].
小米集团:近期豆包 AI 智能手机助手发布后的观点
2025-12-05 06:35
1) From the perspective of AI smartphone agents: Goldman Sachs does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. Investors should consider this report as only a single factor in making their investment decision. For Reg AC certification and other important disclosures, see the Disclosure Appendix, or go to www.gs.com/research/hedge.html. Analysts ...
告别GUI Agent工程基建噩梦:阶跃开源4B Agent模型,跑通所有安卓设备,手搓党一键部署
量子位· 2025-11-30 06:45
Core Insights - The article discusses the launch of GELab-Zero, an open-source GUI Agent model that allows for easy deployment and aims to enhance the scalability of mobile agents in various applications [1][8]. Group 1: Model Performance and Capabilities - The 4B version of the GUI Agent model has achieved state-of-the-art (SOTA) performance across multiple GUI benchmarks on both mobile and desktop platforms [2][11]. - GELab-Zero-4B-preview outperforms other mainstream models, including larger parameter models like GUI-Owl-32B, demonstrating superior performance and easier deployment [13][11]. - The model is designed to handle complex tasks and vague instructions effectively, showcasing its versatility in various applications [19][24]. Group 2: Development and Deployment - The article emphasizes the need to lower development and usage barriers for mobile agents, allowing developers to focus on value creation rather than infrastructure setup [7][30]. - GELab-Zero includes a complete technical architecture that enables one-click deployment, facilitating a seamless experience for developers [25][26]. - The model supports lightweight local inference, enabling it to run on consumer-grade hardware while maintaining low latency and privacy [26]. Group 3: Evaluation Standards - The research team has established a new evaluation standard called AndroidDaily, which focuses on real-world applications and user scenarios, moving beyond traditional productivity benchmarks [5][31]. - AndroidDaily assesses the model across six core dimensions of modern life, including dining, travel, shopping, housing, information consumption, and entertainment [33]. - The evaluation framework includes both static and end-to-end testing methodologies to ensure comprehensive assessment of the model's capabilities [35][38]. Group 4: Future Directions - The research team aims to continue optimizing model performance, expanding cross-platform support, and enriching the ecosystem of tools while adhering to principles of openness, control, and privacy [41].