Workflow
多智能体系统
icon
Search documents
Chain-of-Agents: OPPO推出通用智能体模型新范式,多榜单SOTA,模型代码数据全开源
机器之心· 2025-08-23 04:42
针对上述瓶颈,本文提出了一种全新的智能体推理范式——Chain-of-Agents(CoA)。与传统的 TIR 模型仅支持单一智能体的「思考-行动-观察」模式不同,CoA 框架能够灵活定义多个角色和工具的智能体,在单一模型内动态激活,实现端到端的多智能体协作。 本文通讯作者周王春澍,OPPO个性化AI实验室负责人,主要研究方向是AI个性化、智能体的自主进化和强化学习、以及大模型和智能体的记忆系统等。本文核 心贡献者均来自OPPO个性化AI实验室的AI智能体团队。 近年来,以多智能体系统(MAS)为代表的研究取得了显著进展,在深度研究、编程辅助等复杂问题求解任务中展现出强大的能力。现有的多智能体框架通过多 个角色明确、工具多样的智能体协作完成复杂任务,展现出明显的优势。然而,现阶段的 MAS 依然面临一些关键限制: 同时,近期兴起的工具融合推理(TIR)模型,通过显式地将工具使用融入推理过程,显著提升了单智能体框架(如 ReAct)在信息检索任务中的表现。然而,传 统的 TIR 模型,无法直接支持多智能体系统的原生训练与协作。 计算开销高 : 智能体之间频繁冗余的通信和复杂的工作流设计导致效率不高。 泛化能力有 ...
内幕曝光:OpenAI模型坦承不会第六题,3人俩月拿下IMO金牌
3 6 Ke· 2025-08-12 00:57
OpenAI在短短两个月内,让AI从挣扎于小学数学题跃升至国际数学奥林匹克(IMO)金牌水平,背后是通用AI技术的突破。 OpenAI的ChatGPT真能拿到国际奥数IMO金牌?还是OpenAI的自嗨?背后到底有何隐情? OpenAI的IMO金牌核心团队Alexander Wei、Noam Brown与Sheryl Hsu做客红杉Training Data播客,分享了如何在两月内让AI斩获IMO金牌。 比如说,OpenAI内部并非所有人都持乐观态度。某位研究员甚至打赌模型不会赢,赔率高达2:1,不过最终因为「不想影响士气」而放弃了赌局。 比赛当天凌晨1-5点,Noam Brown忙里偷闲,小憩了一番,而Alexander Wei疯狂检查模型生成的证明。 他们这次还解释了是如何决定AI是不是拿到了金牌。为了评分,他们雇用了外部的IMO奖牌获得者。每份证明都由三名奖牌获得者进行评分,他们对正 确性达成了一致意见 。就这样,他们认为AI的确有能力拿到IMO金牌。 他们还透露证明像「外星语言」般独特,可读性不高。虽有有能力优化,但为了透明,他们选择发布了原始输出。 如果你只想快速了解精华,先看下方要点;想读幕后故事, ...
GPT5令人失望的背后:OpenAI如何做商业战略调整 | Jinqiu Select
锦秋集· 2025-08-08 15:38
由此引发不少猜测,有人认为OpenAI此举是一种策略调整,或试图借助GPT-5这种相对封闭的模型体系,来推动更强的商业变现。 GPT-5终于在昨天晚上正式发布。 OpenAI官方宣称,GPT-5实现了"推理集成",将快速响应与深度推理统一为一站式体验。同时,它在代码生成、创意写作、多模态能力,以及工具使用等多个维度 上实现了整合提升。 然而,我们并没有看到GPT-5在前沿指标上取得明显突破。 尽管OpenAI官方强调,他们的开发策略更关注真实使用场景下的实用性、稳定性和体验的一致性,而不是盲目追求高分或刷指标。但从社区反馈来看,用户的反响 并不积极。大量用户批评OpenAI在移除旧模型的同时,并未给出令人信服的替代方案,GPT-5无法胜任全部任务,甚至在部分场景中表现退步。 发布当天,OpenAI的几位核心高管——总裁Greg Brockman、首席财务官Sarah Frier,以及首席研究官Mark Chen,接受了TBPN独家访谈。在访谈中,他们系统地分 享了OpenAI当前的商业战略和品牌发展思路,展示了GPT-5背后的思考。 锦秋基金(公众号:锦秋集;ID:jqcapital)认为,这篇访谈展示了Op ...
2025上半年AI核心成果及趋势报告-量子位智库
Sou Hu Cai Jing· 2025-08-01 04:37
Application Trends - General-purpose Agent products are deeply integrating tool usage, capable of automating tasks that would take hours for humans, delivering richer content [1][13] - Computer Use Agents (CUA) are being pushed to market, focusing on visual operations and merging with text-based deep research Agents [1][14] - Vertical scenarios are accelerating Agentization, with natural language control becoming part of workflows, and AI programming gaining market validation with rapid revenue growth [1][15][17] Model Trends - Reasoning capabilities are continuously improving, with significant advancements in mathematical and coding problems, and some models performing excellently in international competitions [1][20] - Large model tools are enhancing their capabilities, integrating visual and text modalities, and improving multi-modal reasoning abilities [1][22] - Small models are accelerating in popularity, lowering deployment barriers, and model evaluation is evolving towards dynamic and practical task-oriented assessments [1][30] Technical Trends - Resource investment is shifting towards post-training and reinforcement learning, with the importance of reinforcement learning increasing, and future computing power consumption potentially exceeding pre-training [1][33] - Multi-agent systems are becoming a frontier paradigm, with online learning expected to be the next generation of learning methods, and rapid iteration and optimization of Transformer and hybrid architectures [1][33] - Code verification is emerging as a frontier for enhancing AI programming automation, with system prompts significantly impacting user experience [1][33] Industry Trends - xAI's Grok 4 has entered the global top tier, demonstrating that large models lack a competitive moat [2] - Computing power is becoming a key competitive factor, with leading players expanding their computing clusters to hundreds of thousands of cores [2] - OpenAI's leading advantage is diminishing as Google and xAI catch up, with the gap between Chinese and American general-purpose large models narrowing, and China showing strong performance in multi-modal fields [2]
AI智能体(八):构建多智能体系统
3 6 Ke· 2025-07-27 23:12
Group 1 - The article discusses the value creation potential of AI agents in workflows that are difficult to automate using traditional methods [3]. - AI agents consist of three core components: models, tools, and instructions, which are essential for their functionality [6][8]. - The selection of models should be based on the complexity of tasks, with a focus on achieving performance benchmarks while optimizing for cost and latency [3][6]. Group 2 - Function calling is the primary method for large language models (LLMs) to interact with tools, enhancing the capabilities of AI agents [6][7]. - High-quality instructions are crucial for LLM-based applications, as they reduce ambiguity and improve decision-making [8][11]. - The orchestration of AI agents can be modeled as a graph, where agents represent nodes and tool calls represent edges, facilitating effective workflow execution [11][15]. Group 3 - The article outlines a supervisor mode for managing multiple specialized agents, allowing for task delegation and efficient workflow management [16][17]. - Custom handoff tools can be created to enhance the interaction between agents, allowing for tailored task assignments [33][34]. - The implementation of a multi-layered supervisory structure is possible, enabling the management of multiple teams of agents [31].
如何实现可验证的Agentic Workflow?MermaidFlow开启安全、稳健的智能体流程新范式
机器之心· 2025-07-24 03:19
Core Viewpoint - The article discusses the advancements in Multi-Agent Systems (MAS) and introduces "Agentic Workflow" as a key concept for autonomous decision-making and collaboration among intelligent agents, highlighting the emergence of structured and verifiable workflow frameworks like "MermaidFlow" [1][4][22]. Group 1: Introduction to Multi-Agent Systems - The development of large language models is driving the evolution of AI agents from single capabilities to complex system collaborations, making MAS a focal point in both academia and industry [1]. - Leading teams, including Google and Shanghai AI Lab, are launching innovative Agentic Workflow projects to enhance the autonomy and intelligence of agent systems [2]. Group 2: Challenges in Current Systems - Existing systems face significant challenges such as lack of rationality assurance, insufficient verifiability, and difficulty in intuitive expression, which hinder the reliable implementation and large-scale deployment of MAS [3]. Group 3: Introduction of MermaidFlow - The "MermaidFlow" framework, developed by researchers from Singapore's A*STAR and Nanyang Technological University, aims to advance agent systems towards structured evolution and safe verifiability [4]. - Traditional workflow expressions often rely on imperative code like Python scripts or JSON trees, leading to three core bottlenecks: opaque structure, verification difficulties, and debugging challenges [7][10]. Group 4: Advantages of MermaidFlow - MermaidFlow introduces a structured graphical language that models agent behavior planning as a clear and verifiable flowchart, enhancing the interpretability and reliability of workflows [8][12]. - The structured representation allows for clear visibility of agent definitions, dependencies, and data flows, facilitating easier debugging and optimization [11][14]. Group 5: Performance and Evolution - MermaidFlow demonstrates a high success rate of over 90% in generating executable and structurally sound workflows, significantly improving the controllability and robustness of agent systems compared to traditional methods [18]. - The framework supports safe evolutionary optimization through a structured approach, allowing for modular adjustments and ensuring compliance with semantic constraints [16][19]. Group 6: Conclusion - As MAS and large model AI continue to evolve, achieving structured, verifiable, and efficient workflows is crucial for agent research, with MermaidFlow providing a foundational support for effective collaboration processes [22].
梳理了1400篇研究论文,整理了一份全面的上下文工程指南 | Jinqiu Select
锦秋集· 2025-07-21 14:03
Core Insights - The article discusses the emerging field of Context Engineering, emphasizing the need for a systematic theoretical framework to complement practical experiences shared by Manus' team [1][2] - A comprehensive survey titled "A Survey of Context Engineering for Large Language Models" has been published, analyzing over 1400 research papers to establish a complete technical system for Context Engineering [1][2] Context Engineering Components - Context Engineering is built on three interrelated components: Information Retrieval and Generation, Information Processing, and Information Management, forming a complete framework for optimizing context in large models [2] - The first component, Context Retrieval and Generation, focuses on engineering methods to effectively acquire and construct context information for models, including practices like Prompt Engineering, external knowledge retrieval, and dynamic context assembly [2] Prompting Techniques - Prompting serves as the starting point for model interaction, where effective prompts can unlock deeper capabilities of the model [3] - Zero-shot prompting provides direct instructions relying on pre-trained knowledge, while few-shot prompting offers a few examples to guide the model in understanding task requirements [4] Advanced Reasoning Frameworks - For complex tasks, structured thinking is necessary, with Chain-of-Thought (CoT) prompting models to think step-by-step, significantly improving accuracy in complex tasks [5] - Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT) further enhance reasoning by allowing exploration of multiple paths and dependencies, improving success rates in tasks requiring extensive exploration [5] Self-Refinement Mechanisms - Self-Refinement allows models to iteratively improve their outputs through self-feedback without requiring additional supervised training data [8][9] - Techniques like N-CRITICS and Agent-R enable models to evaluate and correct their reasoning paths in real-time, enhancing output quality [10][11] External Knowledge Retrieval - External knowledge retrieval, particularly through Retrieval-Augmented Generation (RAG), addresses the static nature of model knowledge by integrating dynamic information from external databases [12][13] - Advanced RAG architectures introduce adaptive retrieval mechanisms and hierarchical processing strategies to enhance information retrieval efficiency [14][15] Context Processing Challenges - Processing long contexts presents significant computational challenges due to the quadratic complexity of Transformer self-attention mechanisms [28] - Innovations like State Space Models and Linear Attention aim to reduce computational complexity, allowing models to handle longer sequences more efficiently [29][30] Context Management Strategies - Effective context management is crucial for organizing, storing, and utilizing information, addressing issues like context overflow and collapse [46][47] - Memory architectures inspired by operating systems and cognitive models are being developed to enhance the memory capabilities of language models [48][50] Tool-Integrated Reasoning - Tool-Integrated Reasoning transforms language models from passive text generators into active agents capable of interacting with the external world through function calling and integrated reasoning frameworks [91][92]
「0天复刻Manus」的背后,这名95后技术人坚信:“通用Agent一定存在,Agent也有Scaling Law”| 万有引力
AI科技大本营· 2025-07-11 09:10
Core Viewpoint - The emergence of AI Agents, particularly with the launch of Manus, has sparked a new wave of interest and debate in the AI community regarding the capabilities and future of these technologies [2][4]. Group 1: Development of AI Agents - Manus has demonstrated the potential of AI Agents to automate complex tasks, evolving from mere language models to actionable digital assistants capable of self-repair and debugging [2][4]. - The CAMEL AI community has been working on Agent frameworks for two years, leading to the rapid development of the OWL project, which quickly gained traction in the open-source community [6][8]. - OWL achieved over 10,000 stars on GitHub within ten days of its release, indicating strong community interest and engagement [9][10]. Group 2: Community Engagement and Feedback - The OWL project received extensive feedback from the community, resulting in rapid iterations and improvements based on user input [9][10]. - The initial version of OWL was limited to local IDE usage, but subsequent updates included a Web App to enhance user experience, showcasing the power of community contributions [10][11]. Group 3: Technical Challenges and Innovations - The development of OWL involved significant optimizations, including balancing performance and resource consumption, which were critical for user satisfaction [12][13]. - The introduction of tools like the Browser Tool and Terminal Tool Kit has expanded the capabilities of OWL, allowing Agents to perform automated tasks and install dependencies independently [12][13]. Group 4: Scaling and Future Directions - The concept of "Agent Scaling Law" is being explored, suggesting that the number of Agents could correlate with system capabilities, similar to model parameters in traditional AI [20][21]. - The CAMEL team is investigating the potential for multi-agent systems to outperform single-agent systems in various tasks, with evidence supporting this hypothesis [21][22]. Group 5: Perspectives on General Agents - There is ongoing debate about the feasibility of "general Agents," with some believing in their potential while others view them as an overhyped concept [2][4][33]. - The CAMEL framework is positioned as a versatile multi-agent system, allowing developers to tailor solutions to specific business needs, thus supporting the idea of general Agents [33][34]. Group 6: Industry Trends and Future Outlook - The rise of protocols like MCP and A2A is shaping the landscape for Agent development, with both seen as beneficial for streamlining integration and enhancing functionality [30][35]. - The industry anticipates a significant increase in Agent projects by 2025, with a focus on both general and specialized Agents, indicating a robust future for this technology [34][36].
给你一群顶尖AI,如何组队才能发挥最大战力?UIUC用一个新的多智能体协作基准寻找答案
机器之心· 2025-07-09 04:23
Core Viewpoint - The article discusses the emergence of AI teams that collaborate like human teams in software development and scientific research, highlighting the need for effective evaluation metrics for these multi-agent systems [2][3]. Group 1: Introduction of MultiAgentBench - MultiAgentBench is introduced as a comprehensive benchmark for evaluating the collaboration and competition capabilities of LLM-based multi-agent systems [4][6]. - It aims to fill the gap in existing evaluation metrics that focus primarily on individual agent capabilities rather than the essential aspects of collaboration efficiency and communication quality [3][6]. Group 2: Key Findings and Contributions - The research reveals that the gpt-4o-mini model exhibits the strongest overall task performance among various models [8]. - The decentralized collaboration model using a graph structure is found to be the most efficient, while cognitive self-evolution planning significantly enhances task completion rates [8][12]. - MultiAgentBench identifies critical moments where agents begin to exhibit emergent social behaviors, providing insights into achieving AGI-level collaboration [9][12]. Group 3: Evaluation Framework - The framework includes a collaboration engine, an agent graph to structure relationships, and a cognitive module for personalized information and adaptive strategies [12][15]. - It incorporates diverse interaction strategies and six varied evaluation scenarios, simulating real-world team dynamics [19][20]. Group 4: Performance Metrics - The evaluation system uses milestone-based KPIs to assess task completion and collaboration quality, including task scores, communication scores, and planning scores [27][28]. - The findings indicate that high collaboration does not always correlate with superior task outcomes, emphasizing the importance of individual agent capabilities [30][32]. Group 5: Organizational Structure and Team Dynamics - The study highlights that decentralized organizational structures outperform hierarchical ones, which can lead to communication costs and inefficiencies [38]. - The "Ringelmann Effect" is observed, where increasing the number of agents can lead to diminishing returns in performance, underscoring the need for efficient collaboration mechanisms [40]. Group 6: Emergence of Social Intelligence - Notable emergent behaviors, such as strategic silence and trust differentiation, are observed in competitive scenarios, indicating a shift from pure logical reasoning to initial social behavior capabilities in AI agents [43][44]. - The findings suggest that under the right conditions, AI can learn and exhibit advanced social behaviors, marking a significant step towards more sophisticated artificial intelligence [48].
探索金融多领域应用 中财融通大模型及上市公司研报智能体发布
Sou Hu Cai Jing· 2025-07-06 14:55
Group 1 - The CUFEL model and the CUFEL-A research report generation agent were officially launched at the Global Finance Forum hosted by Central University of Finance and Economics on July 5 [1] - CUFEL is described as not just a single model but a cluster of models or an efficient model fine-tuning process, enhancing performance in specific tasks while maintaining general capabilities [3] - The CUFEL-A agent produces independent and in-depth research reports on A-share listed companies through a four-step process: data aggregation, planning, structuring and reflection, and writing [5] Group 2 - The research report evaluation algorithm is built on three principles: generative, end-to-end, and multi-agent system reinforcement learning, improving the quality of report writing [5] - The model was developed by a team of faculty and students from the Central University of Finance and Economics, which is actively collaborating with leading companies in the financial industry to explore applications in smart credit, compliance, and supply chain finance [5]