Core Insights - The article discusses the introduction of a new multi-agent framework called Workforce, along with the OWL (Optimized Workforce Learning) training method, which achieved a 69.70% accuracy on the GAIA benchmark, surpassing both open-source and commercial systems, including OpenAI's offerings [1][18]. Background and Challenges - The rapid development of large language models (LLMs) has revealed limitations in single-agent systems for handling complex real-world tasks, leading to the emergence of multi-agent systems (MAS) [7]. - Current MAS face significant challenges in cross-domain transferability, as they are often deeply customized for specific domains, limiting flexibility and scalability [7][10]. Innovative Breakthroughs - The Workforce framework employs a "decoupled design" to address cross-domain transfer issues by decomposing the system into three core components: a domain-agnostic planner, a coordinator agent, and specialized worker nodes [8][12]. - This modular architecture allows for easy adaptation to new domains by replacing or adding worker nodes without altering the core planner and coordinator, significantly reducing complexity and costs associated with system migration [12]. Technical Innovations - The OWL training method focuses on optimizing the planner's capabilities rather than training the entire system, utilizing a two-phase training strategy: supervised fine-tuning (SFT) and reinforcement learning optimization [15][19]. - The training design has shown to enhance the performance of models, with the Qwen2.5-32B-Instruct model's performance on GAIA improving from 36.36% to 52.73% [20]. Experimental Validation - The Workforce framework demonstrated significant advantages in multi-agent reasoning, achieving a pass@1 accuracy of 69.70% on the GAIA validation set, outperforming previous bests from both open-source and proprietary frameworks [18][20]. - The performance comparison table highlights Workforce's superior accuracy across various levels compared to other frameworks [20]. Practical Applications - The research team identified several challenges in real-world task automation, including differences in information sources, information timeliness, language ambiguity, and network environment limitations [22][26]. Conclusion - The success of OWL paves the way for building truly general artificial intelligence systems, with Workforce's modular design and cross-domain transfer capabilities offering significant advantages [24][25]. - The framework maintains stable performance across various capability dimensions and features a self-correcting mechanism that enhances performance through dynamic strategy adjustments during testing [25].
突破多智能体系统边界,开源方案OWL超越OpenAI Deep Research,获17k star
机器之心·2025-06-17 03:22