Workflow
工具调用
icon
Search documents
从 Prompt 到 Agent:AI 思维跃迁的核心逻辑
3 6 Ke· 2026-01-19 02:30
Core Insights - The article emphasizes the transition from "Prompt thinking" to "Agent thinking" in AI training, highlighting how this shift is reshaping work methodologies in large companies [1][22]. Group 1: Transition from Prompt to Agent Thinking - Prompt thinking is likened to "literary creation," while Agent thinking is compared to "engineering management," indicating a fundamental change in approach [1][2]. - Many individuals approach Prompt writing as if they are interviewers, expecting perfect answers without a structured process, which leads to inefficiencies [2]. - Effective Agent design is structured, breaking down complex tasks into manageable steps, which is more effective than crafting a perfect Prompt [3]. Group 2: Core Elements of Agent Thinking - Building a true Agent involves translating workplace experience into executable code logic, exemplified by automating the writing of weekly reports [4]. - The first step in the Agent framework is logical planning, which requires designing multi-step reasoning flows rather than simply issuing commands [5][6]. - Long-term memory in Agents is crucial for retaining context and preferences, enhancing their effectiveness in tasks [9][10]. Group 3: Tool Utilization in Agent Framework - Agents possess "administrator privileges," allowing them to perform actions beyond mere text generation, such as data sourcing and function calling [11][12]. - The process of generating reports involves multiple steps, including data retrieval, analysis, and visualization, showcasing the comprehensive capabilities of Agents [14][17][21]. - Agents can integrate structured data into reports, ensuring that outputs are not only accurate but also contextually relevant [13][21]. Group 4: Pitfalls and Best Practices - Companies have encountered various challenges in implementing Agent systems, leading to recommendations for avoiding over-engineering and ensuring effective error-checking mechanisms [22][23]. - The article warns against the pitfalls of excessive complexity in Agent design, which can lead to increased costs and inefficiencies [23]. - It emphasizes the importance of setting confirmation points in the Agent workflow to mitigate cumulative errors [23].
开源最强!“拳打GPT 5”,“脚踢Gemini-3.0”,DeepSeek V3.2为何提升这么多?
华尔街见闻· 2025-12-02 04:21
Core Insights - DeepSeek has released two official models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance levels comparable to GPT-5 and the latter winning gold medals in four international competitions [1][3]. Model Performance - DeepSeek-V3.2 has reached the highest level of tool invocation capabilities among current open-source models, significantly narrowing the gap with closed-source models [2]. - In various benchmark tests, DeepSeek-V3.2 achieved a 93.1% pass rate in AIME 2025, closely trailing GPT-5's 94.6% and Gemini-3.0-Pro's 95.0% [20]. Training Strategy - The model's significant improvement is attributed to a fundamental change in training strategy, moving from a simple "direct tool invocation" to a more sophisticated "thinking + tool invocation" mechanism [9][11]. - DeepSeek has constructed a new large-scale data synthesis pipeline, generating over 1,800 environments and 85,000 complex instructions specifically for reinforcement learning [12]. Architectural Innovations - The introduction of the DeepSeek Sparse Attention (DSA) mechanism has effectively addressed efficiency bottlenecks in traditional attention mechanisms, reducing complexity from O(L²) to O(Lk) while maintaining model performance [6][7]. - The model's architecture allows for better context management, retaining relevant reasoning content during tool-related messages, thus avoiding inefficient repeated reasoning [14]. Competitive Landscape - The release of DeepSeek-V3.2 signals a shift in the competitive landscape, indicating that the absolute technical monopoly of closed-source models is being challenged by open-source models gaining first-tier competitiveness [20][22]. - This development has three implications: lower costs and greater customization for developers, reduced reliance on overseas APIs for enterprises, and a shift in the industry focus from "who has the largest parameters" to "who has the strongest methods" [22].
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].