Workflow
GUI Agent
icon
Search documents
手机厂商、应用方如何看AI手机争议?A2A协作有望破局
第一财经· 2026-01-12 13:37
2026.01. 12 本文字数:1670,阅读时长大约3分钟 作者 | 第一财经 李娜 "一个好的智能体既要有'智慧',又要有'执行力',能够深度理解用户意图并据此采取行动。"在日前的 一场"AI智能体行为安全与发展"研讨会上,阿里研究院院长袁媛这样看待智能体的发展形态。 她同时强调,尽管智能体承载着AI技术变现与用户体验升级的长期期待,但其落地不应成为对既有治理 界面与商业秩序的颠覆性破坏,而应当在安全可控的前提下,通过深度协作推动产业生态演进,"确保 终端和应用在共生中分享模型技术的红利"。 过去一年,围绕"AI能不能替人把事办了",市场已经出现了多种探索路径。其中,一些以"AI接管手机 操作"为特征的产品形态,尝试通过"看懂屏幕、模拟操作"的方式,跨应用完成剪视频、订票、点外卖 等具体任务。这类路径,通常被归类为GUI Agent。 从技术实现上看,这类产品通常采用"让模型理解屏幕内容,再模拟人类操作界面"的方式运行。但随之 而来的,是一整套必须被重构的问题:谁授予它权限?谁为它的错误负责?它能调用哪些服务?又该被 谁约束? OPPO ColorOS智慧产品研发总监姜昱宸在一场媒体沟通会中对第一财经记 ...
手机厂商、应用方如何看AI手机争议?A2A协作有望破局
Di Yi Cai Jing· 2026-01-12 12:24
Core Insights - The true challenge of intelligent agents is not just their ability to perform tasks but also their wisdom and execution capabilities, which require a deep understanding of user intent and actions [1] - The development of intelligent agents should not disrupt existing governance frameworks and commercial orders but should promote industry evolution through deep collaboration while ensuring safety and control [1] Group 1: Current Developments in AI Agents - Various exploration paths have emerged in the past year regarding whether AI can replace human tasks, with products attempting to operate through screen understanding and simulated actions, categorized as GUI Agents [3] - These products face significant challenges, including permission granting, accountability for errors, service invocation, and regulatory constraints [3] - Experts suggest that the authorization of intelligent agents should be scene-specific, with critical operations requiring secondary confirmation, and that not all scenarios should be authorized by third-party platforms [3] Group 2: Industry Perspectives on AI Implementation - OPPO's perspective indicates that while products like the Doubao phone positively impact the industry, they are not the final form of AI phones but rather a method to operate existing GUI interfaces [4] - The focus for phone manufacturers is on engineering and scalability, as any instability in system capabilities can lead to significant quality issues [4] - The future of intelligent agents is expected to shift towards A2A (Agent-to-Agent) collaboration models, with the core value for manufacturers lying in their long-term understanding of users rather than just model parameters [4] Group 3: Regulatory and Safety Considerations - The current GUI approach can activate the industry but should not be the sole focus; a more optimal evolution path that balances safety and development should be explored [5] - Apple's model is highlighted as a reference, establishing a collaborative mechanism between intelligent agents and apps through open APIs while ensuring safety boundaries [5] - The introduction of A2A mechanisms and market-based credit systems is suggested to improve authorization processes and manage potential risks associated with disruptive innovations [5]
从豆包手机谈起:端侧智能的愿景与路线图
AI前线· 2025-12-22 05:01
Core Viewpoint - The launch of Doubao Mobile Assistant by ByteDance signifies a significant shift in the application paradigm of large models, transitioning from "Chat" to "Action," establishing it as the first system-level GUI Agent in the industry [2][3]. Technical Analysis and Evaluation - The core technology of Doubao Mobile Assistant is the GUI Agent, which has evolved from an "external framework" to a "model-native intelligent agent" between 2023 and 2025. The early stage (2023-2024) relied on external frameworks that limited the agent's capabilities due to dependency on prompt engineering and external tools [4]. - The introduction of visual language models driven by imitation learning in 2024 marked a shift to model-native capabilities, allowing the agent to understand interfaces directly from pixel inputs, significantly enhancing adaptability to unstructured GUIs [5]. - By 2024-2025, reinforcement learning-driven visual language models became mainstream, enabling agents to autonomously execute tasks in dynamic environments. Doubao Mobile Assistant embodies this technological evolution [5][7]. Development History of GUI Agent - Previous GUI Agents were often limited to demo stages due to reliance on Android accessibility services, which had significant drawbacks. Doubao Mobile Assistant overcomes these issues through a customized OS that allows for non-intrusive system-level control [7][8]. - The model architecture of Doubao Mobile Assistant employs a collaborative end-cloud model, indicating a shift from experimental to practical applications of GUI Agents [8]. Limitations and Future Outlook - Doubao Mobile Assistant faces three major challenges: security risks associated with cloud-side model reliance, insufficient autonomous task completion capabilities, and limited ecological coverage [9][10][11]. - The assistant currently operates as a passive tool, lacking personalized proactive service capabilities. Future developments must focus on enhancing privacy, environmental perception, complex decision-making, and personalized service [12][13]. Evolution of End-Side Intelligence - The emergence of system-level GUI Agents presents a fundamental contradiction between the need for comprehensive operational visibility and user privacy concerns. A balance must be struck to ensure user data sovereignty while providing intelligent services [13][14]. - The future AI mobile ecosystem should adhere to the principle of "end-side native, cloud collaboration," ensuring that sensitive user data remains on-device while leveraging cloud capabilities for complex tasks [14][15]. Autonomous Intelligence and User Interaction - Doubao Mobile Assistant's current capabilities are based on extensive data training, but future autonomous intelligence must enable agents to learn and adapt in dynamic environments, overcoming challenges in generalization, autonomy, and long-term interaction [22][24][25]. - The transition from passive execution to proactive service is essential for personal assistants to reduce user cognitive load and enhance user experience [29][30][31]. Industry Trends and Future Predictions - In the short term (within one year), more mobile assistants are expected to launch, intensifying competition between application developers and hardware manufacturers [35]. - In the medium term (2-3 years), the concept of a "personal exclusive assistant" will solidify, with end-side models evolving to provide personalized experiences based on user data [36]. - In the long term (3-5 years), a new type of end-side hardware will emerge, integrating high privacy operations and lightweight tasks, ensuring data sovereignty and rapid response times [38].
豆包手机引发的思考:AgentVS超级App,AI公司VS手机厂商
新财富· 2025-12-16 08:22
Core Viewpoint - The launch of the Doubao mobile assistant by ByteDance represents a significant step towards the realization of system-level AI agents, challenging the dominance of super apps like WeChat and Alipay in the mobile ecosystem [2][14][27] Group 1: Doubao Mobile Assistant Launch - On December 1, ByteDance's Doubao team released a technical preview of the Doubao mobile assistant, which collaborates deeply with phone manufacturers at the operating system level to enable cross-application automation [2] - The initial batch of 30,000 units sold out instantly, but within two days, major super apps like WeChat, Alipay, Taobao, and Meituan blocked the Doubao mobile assistant [3] Group 2: AI Agent Development - The Doubao mobile assistant demonstrates the feasibility of GUI agents, completing a closed-loop attempt for AI phones, but raises questions about its practical utility in real-world scenarios [5] - The evolution of AI agents has transitioned from fixed scripts and rule engines to a stage where GUI intelligent agents can understand and operate across applications, as seen with advancements from companies like Microsoft and Anthropic [6][7] Group 3: System-Level Agent vs. Super Apps - The system-level agent can understand user intent and orchestrate multiple apps, moving the focus from an app-centric model to a user-intent-centric model [8][10] - The core advantages of system-level agents include the ability to organize tasks across multiple apps and theoretical platform neutrality, alleviating long-standing issues like fragmented cross-app processes [11][12] Group 4: Industry Dynamics and Conflicts - The emergence of the Doubao mobile assistant has highlighted the conflict between system-level agents and super apps, with super apps responding defensively to protect their user entry points [14][15] - The long-term outcome may not be the elimination of one model over the other, but rather a redefinition of power boundaries and responsibilities between system-level agents and super apps [17] Group 5: Manufacturer Strategies - Different manufacturers are adopting varied strategies regarding AI agents, with Huawei integrating agents into its operating system, Xiaomi focusing on ecosystem integration, and Apple maintaining a single official agent [19][23][24] - The competitive landscape suggests a future where multiple agents coexist in the Android ecosystem, while iOS maintains a clearer structure with one official agent and several plugins [24][25]
豆包手机触碰了大厂APP的“逆鳞”
3 6 Ke· 2025-12-15 23:28
Core Viewpoint - The emergence of Doubao AI phone has sparked significant interest in the potential of AI agents as a new entry point in the internet ecosystem, but it faces immediate backlash from major internet companies due to security and operational concerns [1][2][3]. Group 1: Doubao AI Phone and Its Features - Doubao AI phone allows users to perform complex cross-application operations through an integrated AI agent, enhancing user experience [1]. - Users reported issues with accessing major applications like WeChat and Alipay shortly after the phone's launch, indicating a significant reduction in functionality [2]. - The phone's initial appeal diminished as it could no longer utilize its AI capabilities effectively with popular apps, leading to a decline in user experience [2]. Group 2: Industry Response and Competition - Industry insiders expressed a lack of surprise at the backlash against Doubao, with Tencent attributing the issues to existing security measures [3]. - The competition for the next generation of traffic entry points among internet giants is intensifying, with companies like Alibaba and Tencent scrambling to establish their AI applications [4][5]. - Doubao's rapid rise in daily active users (DAU) highlights its initial success, but subsequent declines in user engagement raise questions about its sustainability [6]. Group 3: The Shift in User Engagement and Advertising - The dominance of major apps like Taobao and WeChat has led to a high concentration of user traffic, creating a "traffic anxiety" among internet companies [4][5]. - The introduction of GUI agents, which can operate apps without user interaction, threatens traditional advertising revenue models by bypassing app usage [13][15]. - The growth of AI assistants among smartphone manufacturers indicates a shift in the value chain from internet companies to hardware manufacturers [16]. Group 4: Future Implications and Developments - The release of Doubao AI phone has prompted other companies to accelerate their development of AI agents, with a focus on creating competitive products [19][20]. - The open-sourcing of AI agent models could democratize access to this technology, potentially leading to a proliferation of personalized agents that challenge established players [21]. - The urgency for internet giants to adapt and innovate in response to the evolving landscape of AI applications is becoming increasingly critical [22].
豆包“撕裂”AI手机
投中网· 2025-12-13 06:49
Core Viewpoint - The emergence of the Doubao phone, a collaboration between Doubao and Nubia, has disrupted the AI smartphone market, showcasing advanced capabilities while sparking significant controversy and debate within the industry [6][7]. Group 1: Product Overview - The Doubao phone is technically a preview version of the Nubia M153, featuring deep integration of the Doubao assistant into its operating system, allowing for continuous operations beyond traditional voice assistants [6][7]. - The phone's market price surged from the original 3499 yuan to as high as 36,000 yuan, reflecting a split sentiment between excitement and skepticism among consumers [6]. - It has been praised for its ability to perform complex tasks across multiple applications, such as answering questions on Bilibili and making purchases, leading to comparisons with human-like interaction [6][9]. Group 2: AI Smartphone Landscape - The concept of "AI smartphones" gained traction in the second half of 2023, with major manufacturers like Samsung, Google, and Xiaomi emphasizing AI integration, but many offerings were criticized for lacking true innovation [8][9]. - Doubao's approach is more aggressive, enabling extensive cross-application operations that surpass the capabilities of existing AI smartphones, which are often limited to predefined tasks [9][11]. Group 3: Technical Pathways - The industry is divided into two main technical pathways for AI smartphones: traditional methods relying on SDK interfaces and the newer GUI Agent approach, which allows direct interaction with the screen [9][10]. - Doubao's implementation of GUI Agent technology enables it to autonomously handle tasks without relying on app-specific interfaces, a significant advancement over earlier AI assistants [10][11]. Group 4: Industry Conflict - The Doubao phone's capabilities have led to pushback from major apps like WeChat and Alipay, which have implemented restrictions to prevent automated operations, highlighting a clash between traditional app security measures and new AI functionalities [14][15]. - The core of the conflict lies in differing interpretations of user operation permissions, with apps prioritizing human-like interactions while AI systems advocate for user-authorized automation [15][16]. Group 5: Market Dynamics and Future Outlook - The AI smartphone sector is becoming a battleground for tech companies vying for dominance in the AI era, with the potential to redefine user interaction and data flow [22][23]. - The emergence of the Doubao phone has prompted major tech firms to reassess their AI strategies, leading to a competitive landscape categorized into three tiers based on integration capabilities and market responsiveness [24][25][26].
AI版「互联网协议」面世,豆包手机们再也不怕被「封禁」了?
3 6 Ke· 2025-12-12 08:36
Core Viewpoint - The article discusses the growing restrictions on the "Doubao Phone" (Nubia M53) applications, highlighting a significant conflict between AI-driven tools and established app ecosystems, particularly regarding user access and operational permissions [1][13]. Group 1: Doubao Phone and GUI Agent - The Doubao Phone is facing increasing bans on major applications like WeChat, Alipay, and various e-commerce platforms, limiting user access [1]. - The Doubao Phone Assistant employs a GUI Agent approach, allowing AI to interact with mobile interfaces without relying on official APIs, which raises concerns among major app providers [2][15]. - The conflict is not new; platforms like WeChat have previously opposed GUI-based AI interactions, indicating a broader resistance within the industry [13][15]. Group 2: MCP Protocol and Industry Standards - The Model Context Protocol (MCP) has emerged as a potential solution to the challenges posed by GUI Agents, aiming to establish a standardized interface for AI interactions across platforms [4][5]. - MCP is gaining traction as a de facto standard, with major tech companies like OpenAI and Google integrating it into their systems, indicating a shift towards a more interoperable AI ecosystem [7][8]. - The donation of MCP to the Linux Foundation signifies a move towards a neutral governance structure, enhancing its credibility and adoption across the industry [8][9]. Group 3: Future of AI Interaction - The article suggests that the future of AI will rely on a combination of GUI and MCP approaches, where GUI serves as a fallback in the current ecosystem while MCP establishes clearer operational boundaries for AI interactions [20][21]. - The transition to MCP will require significant changes in the internet ecosystem, but it promises a more structured and secure way for AI to interact with various platforms [19][20]. - Ultimately, the goal is to create a unified system where AI can operate seamlessly across different services while adhering to established rules and permissions [20][21].
00后大模型实习生「扒光」豆包手机,千字实测揭秘
3 6 Ke· 2025-12-10 06:50
Core Insights - The "Doubao Phone" has gained immense popularity due to its advanced AI capabilities, allowing users to perform tasks like automatic price comparison, messaging, and travel planning within seconds [1][3][5]. Group 1: Technology and Features - The phone operates on a dual-mode system, distinguishing between a fast, intuitive mode (System 1) and a slower, more reasoning-based mode (System 2) [10][14]. - It utilizes a hybrid perception routing system to handle complex tasks in noisy environments, demonstrating advanced visual understanding [16][19]. - The architecture allows for parallel runtime, enabling the AI to perform tasks in the background without interrupting other phone functions [21]. - The design incorporates a heuristic engineering approach that introduces fixed delays to enhance success rates in task execution [22][25]. - Privacy is addressed through a physical isolation mechanism, ensuring that sensitive information is not recorded or monitored [26][28]. Group 2: Performance and User Experience - The AI exhibits resilience by adapting to failures and dynamically adjusting its approach to task completion [35][39]. - The phone's assistant is capable of executing complex operations, such as document processing and information retrieval, across multiple applications seamlessly [45][51]. - The rapid iteration of the UI-TARS model indicates a commitment to continuous improvement and adaptation to user needs [46]. Group 3: Industry Impact - The emergence of the "Doubao Phone" signifies a shift towards OS-level integration of AI capabilities in mobile devices, potentially redefining the future of smartphones [55][57]. - The phone represents a significant advancement in GUI agents, merging AI with user interface interactions to create a more intuitive user experience [48][58].
徐新成为张一鸣“新股东”,以3.4万亿估值拿下字节跳动部分股权;任正非强调AI重在应用;理想AI眼镜重量仅36g丨AI产业周报
创业邦· 2025-12-07 01:08
Core Insights - The article highlights significant developments in the AI industry, including new product launches, funding rounds, and strategic shifts among major companies [5][34]. Group 1: Company Developments - Midea Group officially announced its humanoid robot strategy, focusing on three categories: humanoid robots, full humanoid robots, and super humanoid robots, aiming for high efficiency and low cost [7]. - Huawei's CEO Ren Zhengfei emphasized the importance of AI applications, contrasting China's focus on practical solutions with the U.S. pursuit of general AI [8]. - Ideal Auto launched its AI glasses, weighing only 36 grams with an 18-hour battery life, showcasing advancements in wearable technology [9]. - The humanoid robot T800 was released by Zhongqing, featuring a height of 1.73m and a weight of 75kg, with a performance cost only one-third of human labor [13]. - JD.com announced that its digital human live streaming service will be free for all merchants, enhancing its e-commerce capabilities [17]. Group 2: Funding and IPOs - Qingwei Intelligent completed over 2 billion RMB in Series C financing, with plans to focus on next-generation reconfigurable chip development and initiate IPO preparations [18]. - Anthropic is preparing for a potential IPO, with a valuation expected to exceed 300 billion USD, indicating strong investor interest [12]. - HeShan Technology announced successful financing rounds totaling several hundred million RMB, with participation from 13 investors [20]. Group 3: AI Technology Advancements - ByteDance released Vidi2, a multimodal large language model for video understanding, capable of processing hours of raw footage and generating complete video segments [19]. - OpenAI is developing a new AI model codenamed "Garlic" to compete with Google's Gemini3, focusing on programming and logical reasoning tasks [29]. - Amazon unveiled its custom AI chip Trainium3, which is four times faster than its predecessor and can reduce AI model training costs by up to 50% [30]. Group 4: Regulatory and Ethical Developments - Eight major e-commerce platforms, including JD.com and Meituan, signed a commitment to regulate AI technology applications, aiming to establish self-regulatory standards [21]. - Doubao Mobile Assistant announced plans to standardize AI operations on mobile devices, including restrictions on certain applications to prevent misuse [9].
保守的谷歌,激进的豆包
3 6 Ke· 2025-12-05 10:23
Core Viewpoint - The development of AI technology must prioritize user rights and regulatory compliance, as exemplified by the contrasting approaches of Google and Doubao in their AI assistant functionalities [11][12]. Group 1: Doubao's AI Assistant - Doubao's AI assistant has faced criticism for its aggressive approach, which bypasses established security protocols of major applications like WeChat and Alipay, raising concerns about user safety and compliance [7][9]. - The company has announced plans to adjust its AI capabilities, particularly in financial applications, to ensure a balance between technological advancement and user safety [4][5]. - Doubao's strategy of directly operating applications without adhering to existing security frameworks has led to backlash from both users and developers, highlighting the risks associated with such innovations [8][10]. Group 2: Industry Context and Comparisons - The global AI Agent landscape is rapidly evolving, with major players like Microsoft, Google, and Amazon investing heavily in AI platforms, each leveraging their unique strengths [6]. - In contrast to Doubao's approach, established AI assistants like Siri and Google Assistant operate within strict API guidelines, ensuring user privacy and data security while avoiding conflicts with application developers [7][9]. - The AHA (Agent Hub Access) solution introduced by Alipay and OPPO exemplifies a more cautious and collaborative approach, focusing on secure and transparent interactions between AI assistants and applications [14][15]. Group 3: Regulatory and Security Considerations - The financial sector imposes stringent regulations on data security and user privacy, making Doubao's bypassing of these protocols particularly contentious [9][10]. - The potential for legal liability and user trust issues arises if Doubao's AI operations lead to security breaches or unauthorized access to sensitive information [8][10]. - The industry's future will depend on establishing standards that respect user privacy and regulatory requirements, as seen in ongoing efforts by Chinese authorities to promote multi-agent interoperability standards [15].