GUI智能体
Search documents
阿里开源AI手机的“灵魂”,GUI智能体2B到235B四个版本全,端云协同成功率暴涨33%
量子位· 2025-12-31 00:55
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 这套系统不只是能帮你点点屏幕,它能主动追问你没说清楚的需求,能直接调用外部API绕过繁琐的界面操作。 甚至还搞了一套端云协同系统,隐私敏感的操作留在本地跑,复杂任务交给云端处理。 AI手机的"灵魂"GUI智能体,就这么全套开源了。 来自阿里通义实验室的MAI-UI:论文、代码、模型全都有,从2B的端侧小模型到235B的云端大模型,一口气发布四个尺寸版本,覆盖全场景 部署需求。 传统做法需要在短信和地图APP之间反复切换,复制粘贴地址,分别搜索路线。但有了MCP工具调用,智能体可以直接用高德地图的API查询 两条路线的驾车距离,一次性拿到结构化结果,大幅压缩操作步骤。 | # User | | | | 中介给我发了两套房子的信息,我想比较一下哪一套离阿里西溪6园区开车更近,好决定租哪一间。公司 | | | | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | | 上是「杭州市余杭区文一西路969号」;把最近那套房子的地址发给我朋友 Mia | | | | | | | ...
性能逼近闭源最强,通义实验室开源Mobile-Agent-v3刷新10项GUI基准SOTA
机器之心· 2025-09-02 03:44
Core Viewpoint - The article highlights the launch of the GUI-Owl and Mobile-Agent-v3, which are advanced open-source models for GUI automation, showcasing superior performance compared to existing models and emphasizing their capabilities in various environments [1][29]. Group 1: Key Achievements - GUI-Owl has achieved state-of-the-art (SOTA) performance on both Android and desktop platforms, with the 32B model surpassing closed-source top models in multiple evaluations [21][29]. - The models are designed to operate in a cloud environment, allowing for dynamic task execution and data collection across multiple operating systems, including Android, Ubuntu, macOS, and Windows [11][29]. Group 2: Technical Innovations - The system employs a self-evolving data production chain that minimizes human involvement in generating high-quality training data, allowing the models to iteratively optimize themselves [11][14]. - GUI-Owl's capabilities include advanced UI element grounding, long task planning, and robust reasoning, enabling it to understand and execute complex tasks effectively [16][20]. Group 3: Reinforcement Learning Framework - A scalable reinforcement learning (RL) system has been developed to enhance the model's stability and adaptability in real-world environments, allowing it to learn continuously from its interactions [22][26]. - The introduction of the Trajectory-aware Relative Policy Optimization (TRPO) algorithm addresses the challenges of sparse and delayed reward signals in GUI automation tasks, improving learning efficiency [26]. Group 4: Conclusion - The release of GUI-Owl and Mobile-Agent-v3 represents a significant advancement in open-source GUI automation, providing a powerful tool for various applications while reducing deployment and resource costs [29].