Agent AI

Search documents
李飞飞的答案:大模型之后,Agent向何处去?
虎嗅APP· 2025-09-07 02:51
以下文章来源于划重点KeyPoints ,作者林易 划重点KeyPoints . 追踪全球AI科技,记录中国硬核崛起 本文来自微信公众号: 划重点KeyPoints ,作者:林易,编辑:重点君,原文标题:《李飞飞的答 案:大模型之后,Agent 向何处去?》,题图来自:视觉中国 2025年,被普遍认为是 Agent 的元年,与之相关的概念从年初至今热度持续走高,包括智能体、AI Agent、Agentic AI 等等。 而就在最近,一篇由李飞飞领衔的 Agent 重磅论文在业内引发了广泛讨论,热度居高不下。网友们 如此评价:"几乎是跪着看完的"、"太清晰,硬控了我3个小时"。 这篇长达80页的综述名为《Agent AI: Surveying the Horizons of Multimodal Interaction》,由李飞 飞等14位来自斯坦福大学和微软的专家联合撰写。 并且,虽然这篇论文最早发表于去年年底,但站在当下节点回顾今年 Agent 的发展,谷歌、OpenAI 和微软等主流玩家的核心打法,几乎都是按照论文给出的能力栈来推进的;这也反过来印证了论文 对"从大模型到 Agent"这一演进路径的前瞻性 ...
李飞飞的答案:大模型之后,Agent向何处去?
Hu Xiu· 2025-09-05 00:34
Core Insights - The article discusses the rising prominence of Agent AI, with 2025 being viewed as a pivotal year for this technology [1][2] - A significant paper led by Fei-Fei Li titled "Agent AI: Surveying the Horizons of Multimodal Interaction" has sparked extensive discussion in the industry [3][6] Summary by Sections Overview of the Paper - The paper, consisting of 80 pages, provides a clear framework for the somewhat chaotic field of Agent AI, integrating various technological strands into a new multimodal perspective [5][6] - It emphasizes the evolution from large models to agents, reflecting the current strategies of major players like Google, OpenAI, and Microsoft [6] New Paradigm of Agent AI - The paper introduces a novel cognitive architecture for Agent AI, which is not merely a compilation of existing technologies but a forward-thinking approach to the development of Artificial General Intelligence (AGI) [9] - It defines five core modules: Environment and Perception, Cognition, Action, Learning, and Memory, which together form an interactive cognitive loop [10][26] Core Modules Explained - **Environment and Perception**: Agents actively perceive information from their surroundings in a multimodal manner, incorporating various data types [12][13] - **Cognition**: Acts as the processing center for agents, enabling complex activities such as reasoning and empathy [15][16] - **Action**: Converts cognitive decisions into specific operational commands, affecting both physical and virtual environments [18][19] - **Learning**: Highlights the continuous learning and self-evolution capabilities of agents through various mechanisms [20][21] - **Memory**: Offers a structured system for long-term knowledge retention, allowing agents to leverage past experiences for new tasks [23][24] Role of Large Models - The framework's feasibility is attributed to the maturity of large foundational models, particularly LLMs and VLMs, which provide essential cognitive capabilities for agents [28][29] - These models enable agents to decompose vague instructions into actionable tasks, significantly reducing the complexity of task programming [30][31] Challenges and Ethical Considerations - The paper identifies the issue of "hallucination" in models, where they may generate inaccurate content, posing risks in real-world interactions [32][33] - It emphasizes the need for inclusivity in designing Agent AI, addressing biases in training data and ensuring ethical interactions [36][39] - The importance of establishing regulatory frameworks for data privacy and security in Agent AI applications is also highlighted [38][39] Application Potential - The paper explores the vast application potential of Agent AI in gaming, robotics, and healthcare [40] - In gaming, Agent AI can create dynamic NPCs that interact meaningfully with players, enhancing immersion [42][43] - In robotics, agents can autonomously execute complex tasks based on simple verbal commands, streamlining user interaction [48][49] - In healthcare, Agent AI can assist in preliminary diagnostics and patient monitoring, improving efficiency in resource-limited settings [54][57] Future Directions - The paper acknowledges that Agent AI is still in its early stages, facing challenges in integrating multiple modalities and creating general-purpose agents [58][60] - It proposes new evaluation benchmarks to measure agent intelligence and guide future research [61]
李飞飞的答案:大模型之后,Agent 向何处去?
3 6 Ke· 2025-09-04 08:28
Core Insights - The latest paper by Fei-Fei Li delineates the boundaries and establishes paradigms for the currently trending field of Agents, with major players like Google, OpenAI, and Microsoft aligning their strategies with the proposed capability stack [1][4] - The paper introduces a comprehensive cognitive loop architecture that encompasses perception, cognition, action, learning, and memory, forming a dynamic iterative system for intelligent agents, which is not only a technological integration but also a systematic vision for the future of AGI [1][5] - Large models are identified as the core engine driving Agents, while environmental interaction is crucial for addressing issues of hallucination and bias, emphasizing the need for real or simulated feedback to calibrate reality and incorporate ethical and safety mechanisms [1][3][11] Summary by Sections 1. Agent AI's Core: A New Cognitive Architecture - The paper presents a novel Agent AI paradigm that is a forward-thinking consideration of the development path for AGI, rather than a mere assembly of existing technologies [5] - It defines five core modules: Environment and Perception, Cognition, Action, Learning, and Memory, which together create a complete and interactive cognitive loop for intelligent agents [5][10] 2. How Large Models Drive Agent AI - The framework of Agent AI is made possible by the maturity of large foundational models, particularly LLMs and VLMs, which serve as the basis for the cognitive capabilities of Agents [11][12] - LLMs and VLMs have internalized vast amounts of common and specialized knowledge, enabling Agents to perform zero-shot planning effectively [12] - The paper highlights the challenge of "hallucination," where models may generate inaccurate content, and proposes environmental interaction as a key anchor to mitigate this issue [13] 3. Application Potential of Agent AI - The paper explores the significant application potential of Agent AI in three cutting-edge fields: gaming, robotics, and healthcare [14][19] - In gaming, Agent AI can transform NPC behavior, allowing for meaningful interactions and dynamic adjustments based on player actions, enhancing immersion [15] - In robotics, Agent AI enables users to issue commands in natural language, allowing robots to autonomously plan and execute complex tasks [17] - In healthcare, Agent AI can serve as a medical chatbot for preliminary consultations and provide diagnostic suggestions, particularly in resource-limited settings [19][21] 4. Conclusion - The paper acknowledges that Agent AI is still in its early stages and faces challenges in achieving deep integration across modalities and domains [22] - It emphasizes the need for standardized evaluation metrics to guide development and measure technological progress in the field [22]