CogAgent
Search documents
智能体不再 “偏科”,OpenAI、讯飞、千问等各显神通
AI研究所· 2026-01-26 09:33
Market Overview - The Chinese intelligent agent market is projected to reach 7.84 billion yuan by 2025, with an expected growth rate exceeding 70% in 2026, driven by demand from manufacturing, energy, finance, and government sectors, which account for over 70% of the market [1] - The "Artificial Intelligence + Manufacturing" initiative aims to cultivate 1,000 high-level industrial intelligent agents, providing strong momentum for industry development [1] Industry Dynamics - Leading companies are accelerating their strategies in response to market and policy drivers, with OpenAI launching the Operator product in 2025 to simulate human computer operations for tasks like ordering food and booking tickets [2] - Alibaba's upgraded Qianwen can perform full-process collaboration for hotel and product inquiries, while Zhiyuan AI has introduced the Auto framework for intelligent agent development, facilitating the transition from mobile devices to intelligent AI terminals [2] - Challenges such as reliance on single-modal interactions, high customization costs, and incomplete execution chains are hindering industry growth, prompting the search for more efficient solutions [2] Technological Advancements - The core capabilities of intelligent agents lie in environmental perception and demand understanding, with multi-modal fusion becoming a common choice among leading companies [4] - Traditional agents often support only single-modal interactions, leading to perception errors in complex environments. Qianwen employs a multi-modal architecture to synchronize processing and understanding of various inputs [5] - Zhiyuan AI's CogAgent enables full GUI space interaction, while OpenAI's Operator allows AI to interact with graphical user interfaces, simulating human operations [5] Development Accessibility - The scaling of intelligent agents requires lowering development barriers, which is a key focus for leading companies [12] - The Starry Intelligent Agent platform offers a native MaaS architecture, allowing quick connections to over 50 high-quality open-source models, enabling developers to build agents without extensive programming knowledge [12] - Various companies are exploring diverse approaches to reduce development barriers, such as Alibaba's simplified application integration and Zhiyuan AI's focus on rapid empowerment of terminal devices [13] Application and Ecosystem - The value of intelligent agents must be demonstrated through specific scenarios, with leading companies focusing on vertical solutions [15] - The Starry Intelligent Agent platform has diversified its application layout, targeting overseas markets in the Middle East and Southeast Asia, covering public services and infrastructure bidding [15] - Other companies like Alibaba and SenseTime are also focusing on specific sectors, such as consumer services and healthcare, to address core industry needs and enhance operational efficiency [18] Collaborative Innovation - The sustainable development of the intelligent agent industry requires an open ecosystem, a consensus recognized by leading companies [19] - Starry Intelligent Agent leverages resources from iFLYTEK's open platform, which has over 10.26 million developers and covers 4.28 billion terminal devices, creating a comprehensive ecosystem [19] - Companies are fostering a virtuous cycle of "technological breakthroughs - scenario applications - ecosystem feedback" to drive the large-scale development of the intelligent agent industry [19] Future Outlook - The intelligent agent industry is transitioning from technological exploration to large-scale implementation, driven by breakthroughs in multi-modal collaboration, reduced development barriers, and improved ecosystem frameworks [21] - Continuous technological iteration and ecosystem enhancement will further integrate intelligent agents into various industries, becoming a core force for productivity improvement and industrial upgrading [21] - Future development will emphasize scenario adaptability, ease of development, and ecosystem openness, with collaborative innovation between companies and developers as a key driver of industry progress [21]
机器人操控新范式:一篇VLA模型系统性综述 | Jinqiu Select
锦秋集· 2025-09-02 13:41
Core Insights - The article discusses the emergence of Vision-Language-Action (VLA) models based on large Vision-Language Models (VLMs) as a transformative paradigm in robotic manipulation, addressing the limitations of traditional methods in unstructured environments [1][4][5] - It highlights the need for a structured classification framework to mitigate research fragmentation in the rapidly evolving VLA field [2] Group 1: New Paradigm in Robotic Manipulation - Robotic manipulation is a core challenge at the intersection of robotics and embodied AI, requiring deep understanding of visual and semantic cues in complex environments [4] - Traditional methods rely on predefined control strategies, which struggle in unstructured real-world scenarios, revealing limitations in scalability and generalization [4][5] - The advent of large VLMs has provided a revolutionary approach, enabling robots to interpret high-level human instructions and generalize to unseen objects and scenes [5][10] Group 2: VLA Model Definition and Classification - VLA models are defined as systems that utilize a large VLM to understand visual observations and natural language instructions, followed by a reasoning process that generates robotic actions [6][7] - VLA models are categorized into two main types: Monolithic Models and Hierarchical Models, each with distinct architectures and functionalities [7][8] Group 3: Monolithic Models - Monolithic VLA models can be implemented in single-system or dual-system architectures, integrating perception and action generation into a unified framework [14][15] - Single-system models process all modalities together, while dual-system models separate reflective reasoning from reactive behavior, enhancing efficiency [15][16] Group 4: Hierarchical Models - Hierarchical models consist of a planner and a policy, allowing for independent operation and modular design, which enhances flexibility in task execution [43] - These models can be further divided into Planner-Only and Planner+Policy categories, with the former focusing solely on planning and the latter integrating action execution [43][44] Group 5: Advancements in VLA Models - Recent advancements in VLA models include enhancements in perception modalities, such as 3D and 4D perception, as well as the integration of tactile and auditory information [22][23][24] - Efforts to improve reasoning capabilities and generalization abilities are crucial for enabling VLA models to perform complex tasks in diverse environments [25][26] Group 6: Performance Optimization - Performance optimization in VLA models focuses on enhancing inference efficiency through architectural adjustments, parameter optimization, and inference acceleration techniques [28][29][30] - Dual-system models have emerged to balance deep reasoning with real-time action generation, facilitating smoother deployment in real-world scenarios [35] Group 7: Future Directions - Future research directions include the integration of memory mechanisms, 4D perception, efficient adaptation, and multi-agent collaboration to further enhance VLA model capabilities [1][6]
智谱CEO张鹏:加速Agent模型产品研发,期待尽快实现一句话操作电脑和手机
IPO早知道· 2024-11-30 02:36
本文为IPO早知道原创 作者|Stone Jin 微信公众号|ipozaozhidao 据IPO早知道消息,作为最早探索 Agent 的大模型企业之一,智谱于11月29日带来了多个新进展: AutoGLM 可以自主执行超过50步的长步骤操作,也可以跨 A pp执行任务 ; AutoGLM开启「全 自动」上网新体验,支持等数十个网站的无人驾驶 ; 像人一样操作计算机的GLM-PC 启动内测, 基于视觉多模态模型实现通用Agent的技术探索 。 具体来讲,新升级的AutoGLM可以挑战完成复杂任务:1. 超长任务:理解超长指令,执行超长任 务。2. 跨App:AutoGLM 支持跨App来执行任务。3. 短口令:AutoGLM能够支持长任务的自定义 短语。4. 随便模式:AutoGLM可以主动帮你做出决策。 同时 AutoGLM启动大规模内测,并将尽快上线成为面向C端用户的产品。AutoGLM同时宣布启动 「10个亿级App免费Auto升级」的计划,邀请 App 伙伴联合探索自己的Auto新场景。 此外,智谱还带来基于PC的自主Agent——GLM-PC是GLM团队面向「无人驾驶」PC的一次技术 探索,基于智谱的 ...