GUI智能体 - filings, earnings calls, financial reports, news

GUI智能体

Search documents

机器之心· 2026-02-18 12:51

GUI 智能体最近卷到什么程度了？ Claude、OpenAI Agent 及各类开源模型你方唱罢我登场，但若真想让 AI 成为「能在手机和网页上稳定干活的助手」，仍绕不开三大现实难题：现在，蚂蚁带来 UI-Venus-1.5：一个遵循「高性能，实战派」设计理念的端到端 GUI 智能体。单个模型即可统一处理定位（Grounding）、移动端（Mobile）与网页端（Web）三大场景，全面支持 40+ 主流中文 App ，让 AI 真正走进用户生活。「知识缺失」难题：基础大模型对 GUI 领域的认知依然薄弱 —— 生僻图标、小众应用的操作逻辑等需要补足。「纸上谈兵」困境：离线训练数据与真实交互环境存在鸿沟，离线看似合理的动作，一到在线任务就翻车。「多模型协同」障碍：尽管视觉定位、任务规划等领域专家模型各有突破，但多模型协作往往依赖复杂框架，协同成本高。报告标题： UI-Venus-1.5 Technical Report 技术报告：https://arxiv.org/abs/2602.09082 代码：https://github.com/inclusionAI/UI-Venus 模型 ...

GUI智能体

Artificial Intelligence

UI-Venus-1.5

GUI智能体

Artificial Intelligence

UI-Venus-1.5

阿里开源AI手机的“灵魂”，GUI智能体2B到235B四个版本全，端云协同成功率暴涨33%

量子位· 2025-12-31 00:55

梦晨发自凹非寺量子位 | 公众号 QbitAI 这套系统不只是能帮你点点屏幕，它能主动追问你没说清楚的需求，能直接调用外部API绕过繁琐的界面操作。甚至还搞了一套端云协同系统，隐私敏感的操作留在本地跑，复杂任务交给云端处理。 AI手机的"灵魂"GUI智能体，就这么全套开源了。来自阿里通义实验室的MAI-UI：论文、代码、模型全都有，从2B的端侧小模型到235B的云端大模型，一口气发布四个尺寸版本，覆盖全场景部署需求。传统做法需要在短信和地图APP之间反复切换，复制粘贴地址，分别搜索路线。但有了MCP工具调用，智能体可以直接用高德地图的API查询两条路线的驾车距离，一次性拿到结构化结果，大幅压缩操作步骤。 | # User | | | | 中介给我发了两套房子的信息，我想比较一下哪一套离阿里西溪6园区开车更近，好决定租哪一间。公司 | | | | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | | 上是「杭州市余杭区文一西路969号」；把最近那套房子的地址发给我朋友 Mia | | | | | | | ...

性能逼近闭源最强，通义实验室开源Mobile-Agent-v3刷新10项GUI基准SOTA

机器之心· 2025-09-02 03:44

Core Viewpoint - The article highlights the launch of the GUI-Owl and Mobile-Agent-v3, which are advanced open-source models for GUI automation, showcasing superior performance compared to existing models and emphasizing their capabilities in various environments [1][29]. Group 1: Key Achievements - GUI-Owl has achieved state-of-the-art (SOTA) performance on both Android and desktop platforms, with the 32B model surpassing closed-source top models in multiple evaluations [21][29]. - The models are designed to operate in a cloud environment, allowing for dynamic task execution and data collection across multiple operating systems, including Android, Ubuntu, macOS, and Windows [11][29]. Group 2: Technical Innovations - The system employs a self-evolving data production chain that minimizes human involvement in generating high-quality training data, allowing the models to iteratively optimize themselves [11][14]. - GUI-Owl's capabilities include advanced UI element grounding, long task planning, and robust reasoning, enabling it to understand and execute complex tasks effectively [16][20]. Group 3: Reinforcement Learning Framework - A scalable reinforcement learning (RL) system has been developed to enhance the model's stability and adaptability in real-world environments, allowing it to learn continuously from its interactions [22][26]. - The introduction of the Trajectory-aware Relative Policy Optimization (TRPO) algorithm addresses the challenges of sparse and delayed reward signals in GUI automation tasks, improving learning efficiency [26]. Group 4: Conclusion - The release of GUI-Owl and Mobile-Agent-v3 represents a significant advancement in open-source GUI automation, providing a powerful tool for various applications while reducing deployment and resource costs [29].