Code Agent
Search documents
DeepSeek-V3.1版本更新
Di Yi Cai Jing· 2025-09-22 13:45
DeepSeek-V3.1现已更新至DeepSeek-V3.1-Terminus版本。官方公号表示,此次更新在保持模型原有能力 的基础上,针对用户反馈的问题进行了改进,包括:语言一致性,缓解了中英文混杂、偶发异常字符等 情况;Agent能力,进一步优化了Code Agent与Search Agent的表现。 此次更新在保持模型原有能力的基础上,针对用户反馈的问题进行了改进。 ...
CodeAgent 2.0 时代开启|GitTaskBench,颠覆性定义代码智能体实战交付新标准
机器之心· 2025-08-30 10:06
Core Insights - The article discusses the limitations of current AI coding benchmarks, which primarily focus on code generation and closed problems, neglecting real-world developer needs such as environment setup and dependency management [2] - A new evaluation paradigm called GitTaskBench has been proposed by researchers from various prestigious institutions, aiming to assess the full lifecycle capabilities of code agents from repository understanding to project delivery [2][5] - GitTaskBench incorporates economic benefits of "framework × model" into its evaluation metrics, providing valuable insights for academia, industry, and entrepreneurs [2] Evaluation Framework - GitTaskBench covers 7 modalities across 7 domains, with 24 subdomains and 54 real tasks, utilizing 18 backend repositories with an average of 204 files, 1,274.78 functions, and 52.63k lines of code [3] - Each task is linked to a complete GitHub repository, natural language instructions, clear input/output formats, and task-specific automated evaluations [4] Capability Assessment - GitTaskBench evaluates code agents on three dimensions: autonomous environment setup, overall coding control, and task-oriented execution [8][9] - The evaluation process includes repository selection, completeness verification, execution framework design, and automated assessment [10] Economic Feasibility - The concept of "cost-effectiveness" is introduced, quantifying the economic viability of agent solutions through metrics that reflect cost savings and efficiency improvements [12][13] - The average net benefit (α value) of agents is calculated based on task completion, market value, quality coefficient, and operational costs [15] Performance Results - The performance of various frameworks and models is analyzed, revealing that OpenHands achieved the highest execution completion rate (ECR) of 72.22% and task pass rate (TPR) of 48.15% [15][16] - GPT-4.1 demonstrated a strong performance with lower costs compared to Claude models, indicating a balance between effectiveness and cost [24] Market Value Insights - The article highlights that tasks with higher human market values yield greater positive alpha returns when successfully completed by agents [18] - Conversely, tasks with lower market values, such as image processing, can lead to negative alpha if operational costs exceed certain thresholds [19][20] Conclusion - The choice of "framework × model" should consider effectiveness, cost, and API usage, with Claude series excelling in code tasks while GPT-4.1 offers cost-effective and stable performance [24] - GitTaskBench can be utilized in various application scenarios, aiding in the evaluation of code agents across multiple modalities [25]
华泰证券 从Agent,到Multi-Agent
2025-03-10 06:49
Summary of Conference Call on AI and Multi-Agent Systems Industry Overview - The conference focuses on the AI industry, particularly the development of chatbots and multi-agent systems, highlighting the transition from single agents to multi-agent systems as a significant trend in AI technology [2][3][7]. Key Points and Arguments 1. **Current State of AI Agents**: The development of AI agents is limited by model capabilities and engineering challenges. Despite high expectations for agents that can replace humans in complex tasks, no mature products have emerged yet [3][4]. 2. **Minus Product**: The Minus product is not an innovative model but offers a new approach to achieving multi-tasking capabilities within existing model limitations. It has sparked interest in the industry for practical applications of agents [4][5]. 3. **Multi-Agent Systems (MAS)**: MAS is a crucial direction in AI development, where multiple agents collaborate to compensate for individual limitations. This system enhances task automation and has shown promising results post-Minuse product launch [5][15]. 4. **Technological Breakthroughs in 2024**: Key advancements in AI technology include improvements in perception, definition, memory, planning, and action, laying the groundwork for more sophisticated multi-agent systems [6][10]. 5. **Action Mechanisms**: Significant breakthroughs in the action phase include virtual machine forms that address data source access issues and agent orchestration capabilities that assign tasks to the most suitable agents [9][10]. 6. **Progress in Large Models**: Large models have made notable progress in reasoning and action through methods like Chain of Thought (COT) and Reasoning + Acting, although human intervention remains common in enterprise applications [10][11]. 7. **Code Agent Development**: Code agents have matured, capable of automating various coding tasks and expanding their application scenarios beyond just code generation [11][12]. 8. **Data Access and Personalization**: The extent of data access is a critical factor in extending general scenarios, with companies like Apple and Tencent working on integrating personal behavior data for enhanced services [12][13]. 9. **MCP Protocol**: The MCP protocol is designed for cloud systems to ensure standardized information sharing and task collaboration among agents, which is vital for the development of multi-agent systems [13][14]. 10. **Enterprise Demand for MAS**: Companies have complex task orchestration needs, leading to significant interest in multi-agent architectures. Firms like Workday, ServiceNow, and Salesforce are exploring these systems to maximize their value [28][30]. Additional Important Insights - **Future of Multi-Agent Technology**: Multi-agent technology is expected to evolve from individual agents to a network, becoming a vital part of the next generation of the internet. This technology will play an increasingly important role in consumer devices [29][30]. - **Open Source Frameworks**: Various open-source multi-agent frameworks are emerging, providing users with customizable solutions to meet their specific needs [25][27]. - **Coordination Mechanisms**: Multi-agent systems utilize both static and dynamic coordination mechanisms, with dynamic approaches becoming more prevalent in current applications [23][24]. This summary encapsulates the key discussions and insights from the conference call, emphasizing the current state and future potential of AI and multi-agent systems in the industry.