CogAgent - filings, earnings calls, financial reports, news

CogAgent

Search documents

锦秋集· 2025-09-02 13:41

Core Insights - The article discusses the emergence of Vision-Language-Action (VLA) models based on large Vision-Language Models (VLMs) as a transformative paradigm in robotic manipulation, addressing the limitations of traditional methods in unstructured environments [1][4][5] - It highlights the need for a structured classification framework to mitigate research fragmentation in the rapidly evolving VLA field [2] Group 1: New Paradigm in Robotic Manipulation - Robotic manipulation is a core challenge at the intersection of robotics and embodied AI, requiring deep understanding of visual and semantic cues in complex environments [4] - Traditional methods rely on predefined control strategies, which struggle in unstructured real-world scenarios, revealing limitations in scalability and generalization [4][5] - The advent of large VLMs has provided a revolutionary approach, enabling robots to interpret high-level human instructions and generalize to unseen objects and scenes [5][10] Group 2: VLA Model Definition and Classification - VLA models are defined as systems that utilize a large VLM to understand visual observations and natural language instructions, followed by a reasoning process that generates robotic actions [6][7] - VLA models are categorized into two main types: Monolithic Models and Hierarchical Models, each with distinct architectures and functionalities [7][8] Group 3: Monolithic Models - Monolithic VLA models can be implemented in single-system or dual-system architectures, integrating perception and action generation into a unified framework [14][15] - Single-system models process all modalities together, while dual-system models separate reflective reasoning from reactive behavior, enhancing efficiency [15][16] Group 4: Hierarchical Models - Hierarchical models consist of a planner and a policy, allowing for independent operation and modular design, which enhances flexibility in task execution [43] - These models can be further divided into Planner-Only and Planner+Policy categories, with the former focusing solely on planning and the latter integrating action execution [43][44] Group 5: Advancements in VLA Models - Recent advancements in VLA models include enhancements in perception modalities, such as 3D and 4D perception, as well as the integration of tactile and auditory information [22][23][24] - Efforts to improve reasoning capabilities and generalization abilities are crucial for enabling VLA models to perform complex tasks in diverse environments [25][26] Group 6: Performance Optimization - Performance optimization in VLA models focuses on enhancing inference efficiency through architectural adjustments, parameter optimization, and inference acceleration techniques [28][29][30] - Dual-system models have emerged to balance deep reasoning with real-time action generation, facilitating smoother deployment in real-world scenarios [35] Group 7: Future Directions - Future research directions include the integration of memory mechanisms, 4D perception, efficient adaptation, and multi-agent collaboration to further enhance VLA model capabilities [1][6]

智谱CEO张鹏：加速Agent模型产品研发，期待尽快实现一句话操作电脑和手机

IPO早知道· 2024-11-30 02:36

本文为IPO早知道原创作者｜Stone Jin 微信公众号｜ipozaozhidao 据IPO早知道消息，作为最早探索 Agent 的大模型企业之一，智谱于11月29日带来了多个新进展： AutoGLM 可以自主执行超过50步的长步骤操作，也可以跨 A pp执行任务； AutoGLM开启「全自动」上网新体验，支持等数十个网站的无人驾驶；像人一样操作计算机的GLM-PC 启动内测，基于视觉多模态模型实现通用Agent的技术探索。具体来讲，新升级的AutoGLM可以挑战完成复杂任务：1. 超长任务：理解超长指令，执行超长任务。2. 跨App：AutoGLM 支持跨App来执行任务。3. 短口令：AutoGLM能够支持长任务的自定义短语。4. 随便模式：AutoGLM可以主动帮你做出决策。同时 AutoGLM启动大规模内测，并将尽快上线成为面向C端用户的产品。AutoGLM同时宣布启动「10个亿级App免费Auto升级」的计划，邀请 App 伙伴联合探索自己的Auto新场景。此外，智谱还带来基于PC的自主Agent——GLM-PC是GLM团队面向「无人驾驶」PC的一次技术探索，基于智谱的 ...

Agent

AI智能操作系统

Artificial Intelligence

Artificial Intelligence

AutoGLM

GLM-PC

CogAgent