Workflow
智能体框架
icon
Search documents
赛道Hyper | 巨头竞速:智能体框架的新入口之争
Hua Er Jie Jian Wen· 2025-09-04 06:36
Core Viewpoint - The competition among tech giants like Tencent, Alibaba, and Microsoft in the open-source intelligent agent frameworks is not merely a technical contest but a strategic positioning for future market dominance in the AI era [2][4][18]. Group 1: Company Strategies - Tencent has launched the Youtu-Agent framework, achieving a 71.47% accuracy on the WebWalkerQA benchmark, which sets a new record for open-source models [1]. - Tencent's approach is cautious, focusing on practical applications such as file management and data analysis, rather than making bold promises about defining new digital entry points [9][10]. - Alibaba's AgentScope 1.0 is more aggressive, aiming to create a comprehensive platform for the entire lifecycle of intelligent agent development, reflecting its strategy of building a foundational infrastructure [10][12]. - Microsoft has embedded intelligent agent capabilities directly into its Office suite and Copilot, leveraging its existing user base to enhance productivity without requiring users to learn a new framework [14][15]. Group 2: Market Dynamics - The value of intelligent agents as a new digital entry point has yet to be validated in real business scenarios, leading companies to explore open-source frameworks as a low-cost market entry strategy [5][6][21]. - The current competition is characterized more by a struggle for narrative and positioning rather than immediate commercial success, as most applications remain in pilot stages [21][26]. - The open-source movement is seen as a strategic defense mechanism, allowing companies to secure their positions in anticipation of future demand for intelligent agents [21][26]. Group 3: Future Implications - The race to establish intelligent agent frameworks is reminiscent of past technology battles, where the winner could define interaction rules and control traffic entry points [17][18]. - The open-source frameworks serve as a testing ground for developers, but the long-term success of these initiatives will depend on sustained investment and the ability to address industry-specific challenges [23][24]. - The ongoing competition among these tech giants indicates that the battle for dominance in the intelligent agent space is far from over, with the current open-source trend merely setting the stage for future developments [26].
大厂角力智能体框架:腾讯宣布开源,阿里同日“上新”
Guan Cha Zhe Wang· 2025-09-02 13:57
Core Insights - Tencent's Youtu Laboratory has officially open-sourced the Youtu-Agent framework, which is designed for practical applications across various scenarios such as file management, data analysis, academic research, and information summarization [1] Performance Highlights - Youtu-Agent has demonstrated superior performance on multiple challenging benchmarks, achieving a 71.47% accuracy on the WebWalkerQA benchmark with DeepSeek-V3.1, setting a new state-of-the-art (SOTA) for open-source models [3] - On the GAIA text subset, Youtu-Agent reached a Pass@1 score of 72.8% with DeepSeek-V3, nearing or surpassing some models that rely on training or paid closed-source frameworks [3] Framework Features - Youtu-Agent is built on the openai-agents-python architecture, adhering to a "design simplicity" philosophy, and is compatible with various model APIs and tools such as DeepSeek and gpt-oss [6] - The framework utilizes a YAML configuration combined with a "meta-agent" mechanism, allowing users to describe their requirements and automatically generate and run agents with a single click [6] - It supports modular and asynchronous design, enabling features like streaming, tracing, and agent-loop, balancing simplicity and efficiency [6] Practical Applications - In local file management, Youtu-Agent can automatically batch process file recognition, renaming, and archiving [6] - For data analysis, it can read datasets from Kaggle, automatically clean, analyze, and output visualized HTML reports [6] - In academic research, it can summarize key points from papers, retrieve related studies, and generate Markdown notes [6] - The "Wide Research" feature allows for automated thematic searches and structured reviews, facilitating "research automation" [6] Competitive Landscape - On the same day, Alibaba's Tongyi Laboratory announced the launch of AgentScope 1.0, a new generation open-source framework focused on multi-agent development, aiming to simplify the development, operation, and management of agents [7]
腾讯开源智能体框架Youtu-Agent
Di Yi Cai Jing· 2025-09-02 06:52
Group 1 - Tencent Youtu Lab announced the official open-source release of the intelligent framework Youtu-Agent [2]
腾讯开源智能体新框架:不用训练无需充值,用开源模型实现SOTA Agent
量子位· 2025-09-02 04:17
Core Viewpoint - The article emphasizes that Youtu-agent, an open-source intelligent agent framework developed by Tencent Youtu Lab, is a key enabler for the practical application of large models, addressing challenges such as high entry barriers and dependency on expensive closed-source APIs [1][4]. Group 1: Performance and Features - Youtu-agent has demonstrated leading performance on multiple challenging benchmarks, achieving a 71.47% accuracy on WebWalkerQA and a 72.8% Pass@1 on the GAIA text subset, showcasing its strong research and application potential without reliance on paid tools [4]. - The framework is designed to be open-source and cost-sensitive, fully compatible with accessible and low-cost deployment environments [5]. - Youtu-agent features a flexible architecture built on openai-agents, compatible with various model APIs and toolsets [6]. Group 2: Automation and Usability - The framework allows for automatic agent generation through a YAML configuration and a "meta-agent" dialogue mechanism, enabling users to generate and run agent configurations with minimal input [8]. - Youtu-agent employs a modular and asynchronous design, supporting streaming, tracing, and agent-loop functionalities for efficient debugging and expansion [9]. - The framework is not merely theoretical but is designed for real-world applications, providing practical tools for various scenarios [10]. Group 3: Use Cases - Case 1: Local File Management - Youtu-agent automates the process of renaming and archiving student submissions based on their format, requiring no manual intervention [12][13]. - Case 2: Data Analysis - The agent reads and processes CSV files, generating structured conclusions and visual reports automatically [14][16]. - Case 3: Paper Analysis - Users can input a PDF paper, and Youtu-agent will extract key content, search for related research, and compile a Markdown report [17][19]. - Case 4: Wide Research - The agent collects and organizes information on broad topics, generating structured Markdown summaries through collaborative sub-agents [20][22]. Group 4: Design Principles and Automation - The DITA principles outline four key dimensions for agent design, including requirements, input/output, tools, and agent patterns, facilitating structured development [23]. - Youtu-agent emphasizes automated agent generation, significantly reducing the customization difficulty and time investment for users [24][25]. - Users can quickly set up and test agents with simple commands, enhancing accessibility for both beginners and experienced developers [28][30]. Group 5: Getting Started - The framework is available on GitHub, allowing users to clone the repository and run predefined templates to experience its capabilities [32]. - Users can explore various examples and utilize the web UI for visualizing agent operations, enhancing the overall user experience [35][42].
SEAgent:开启从实战经验中自我进化的GUI智能体新纪元
机器之心· 2025-08-17 04:28
Core Viewpoint - The development of Current Computer Using Agents (CUA) is heavily reliant on expensive human-annotated data, which limits their application in novel or specialized software environments. To overcome this limitation, researchers from Shanghai Jiao Tong University and The Chinese University of Hong Kong proposed SEAgent, a new framework that allows agents to learn and evolve autonomously through interaction with their environment without human intervention [2][4]. Group 1: SEAgent Framework - SEAgent's core innovation lies in its closed-loop autonomous evolution framework, a deeply optimized evaluation model, and an efficient "specialist-generalist" integration strategy [2][5]. - The autonomous evolution capability of SEAgent is derived from the collaborative functioning of three core components, forming a sustainable and self-driven learning loop [5]. Group 2: Core Components - The Curriculum Generator acts as a "mentor," automatically generating progressively challenging exploration tasks based on the agent's current capabilities and maintaining a "software guide" to document new functionalities discovered during exploration [9]. - The Actor-CUA, which is the agent itself, executes the tasks generated by the Curriculum Generator in the software environment [9]. - The World State Model serves as the "judge," evaluating the agent's performance at each step and providing critical feedback signals for learning, thus completing the evolution loop [9][10]. Group 3: Evaluation Model - A precise "judge" is fundamental to autonomous evolution. Existing open-source large visual language models struggle with evaluating long sequences of agent operations, leading to decreased accuracy with excessive historical inputs. To address this, a more robust evaluation model, the World State Model, was developed [10]. - The optimized World State Model significantly reduces the performance gap with commercial models like GPT-4o, providing reliable and stable evaluation capabilities for the SEAgent framework [10]. Group 4: Specialist-to-Generalist Strategy - The research explores building a "generalist" model capable of operating across multiple software environments, finding that training a generalist directly in multi-software settings is less effective than training specialist models in single software environments [13]. - A three-step efficient "specialist-to-generalist" integration strategy is proposed, which includes innovating the evaluation paradigm, high-quality data distillation, and cultivating specialists before transitioning to a generalist model [14][15]. Group 5: Experimental Results - The final "generalist" agent achieved an overall success rate of 34.5%, surpassing the performance of directly trained generalist models (30.6%) and exceeding the combined performance of all specialist models (32.2%), demonstrating the potential of the "specialist first, then generalist" approach [18]. - Rigorous ablation experiments confirm the necessity of the algorithm design, showing that a high-quality World State Model is essential for effective learning, and exploration-based reinforcement learning (GRPO) significantly outperforms mere imitation [20]. Group 6: Author and Research Interests - The first author of the study is Sun Zeyi, a joint doctoral student from Shanghai Jiao Tong University and the Shanghai Artificial Intelligence Laboratory, with multiple publications in CVPR, ICCV, and NeurIPS, and research interests in GUI-Agent, multimodal learning, and reinforcement learning [20].
腾讯AI Lab开源可复现的深度研究智能体,最大限度降低外部依赖
量子位· 2025-08-06 05:56
Core Insights - The article discusses the transformative potential of Deep Research Agents powered by large language models (LLMs) and vision-language models (VLMs) in knowledge discovery and problem-solving [1] - It highlights the limitations of existing open-source agent frameworks that rely on paid tools, which restrict reproducibility and universality [2] Group 1: Cognitive Kernel-Pro Framework - Tencent AI Lab has launched Cognitive Kernel-Pro, a fully open-source, multi-module, hierarchical agent framework that provides a breakthrough solution for the development and training of deep research agents [4] - Cognitive Kernel-Pro outperforms the open-source free framework SmolAgents on the GAIA benchmark suite, with its 8B model surpassing WebDancer and WebSailor-7B on GAIA-text [5] - The framework's technical reports and code have been made available on GitHub, promoting community engagement and reproducibility [8] Group 2: Core Design Features - The framework features a modular architecture with a two-layer design, consisting of a main agent responsible for task decomposition and multiple sub-agents focused on specific tasks, ensuring modular independence and scalability [11] - It incorporates a "Progress State" mechanism for structured state management, enhancing efficiency in handling complex tasks by tracking completed steps and key information [11] - Standardized task interfaces allow communication between the main and sub-agents through simple text interfaces, facilitating collaboration and debugging [11] - The framework employs reflection and voting mechanisms to optimize task completion quality, particularly in high-variability tasks like web browsing [11] Group 3: Innovative Training Methods - Cognitive Kernel-Pro includes a comprehensive training process covering web navigation, file processing, code generation, and reasoning, with a focus on high-quality data construction [16][17] - The training data is enhanced through the use of verifiable query-answer pairs and diverse synthetic queries generated from Persona Hub, improving data quality and robustness [17] - Existing datasets have been refined to align with agent task formats, ensuring relevance to real-world applications [17] Group 4: Performance Advantages - Cognitive Kernel-Pro demonstrates superior performance in web information retrieval, file processing, and complex reasoning tasks, closely approaching the capabilities of paid tool-dependent agent frameworks [19][20] - The framework emphasizes the inherent capabilities of LLMs and VLMs, minimizing external dependencies and achieving true open-source status [20] - Performance comparisons show that Cognitive Kernel-Pro excels in functionality and open-source accessibility compared to existing frameworks [20][22] Group 5: Future Directions - The research team plans to focus on distilling reflection capabilities into a unified agent base model in future work [26]
o3-pro通关“推箱子”,人类怀旧小游戏成了大模型新Benchmark
量子位· 2025-06-16 04:50
Core Viewpoint - Classic nostalgic games like Sokoban and Tetris have become benchmarks for evaluating large models, with the o3-pro model recently surpassing previous performance limits in these games [1][2][6]. Group 1: Benchmark Performance - The o3-pro model successfully completed all levels of Sokoban, which previously had a benchmark limit at the sixth level [3][8]. - In comparison to the previous state-of-the-art model (SOTA), o3, the performance of o3-pro has doubled [3][10]. - The scoring system for Tetris involves calculating the number of placed blocks and the number of cleared lines multiplied by ten, until the game ends [13][22]. Group 2: Game Characteristics and Evaluation - The Lmgame benchmark includes several games, such as 2048, Candy Crush, Super Mario Bros, and Phoenix Wright, each with unique evaluation criteria [18][24]. - The evaluation for 2048 is based on the total value of merged blocks, while Candy Crush measures the total candies eliminated in a fixed number of rounds [24]. - The evaluation methods do not consider time as a factor, focusing instead on game-specific performance metrics [22][24]. Group 3: Model Development and Support - The project is developed by the Hao AI Lab at UCSD, which is affiliated with the machine learning systems and NLP labs [28]. - The lab has received funding from Google and NVIDIA, with NVIDIA donating a DGX B200 system to support their research [34]. - The benchmark is open-source, allowing interested parties to download and test their models [23].
o3-pro通关“推箱子”,人类怀旧小游戏成了大模型新Benchmark
量子位· 2025-06-16 04:49
Core Viewpoint - Classic nostalgic games like "Sokoban" and "Tetris" have become benchmarks for evaluating large models, with the o3-pro model achieving significant breakthroughs in these games [1][6]. Group 1: Benchmark Performance - The o3-pro model surpassed previous benchmarks by completing all levels of Sokoban, while the best prior model, o3, only reached the sixth level [2][3]. - In Tetris, the scoring system combines the number of placed blocks with ten times the number of cleared lines, and o3-pro's performance doubled that of o3 [3][13]. - The o3-pro model's performance is notable for its time-consuming operations, taking several minutes for each move [17]. Group 2: Game Evaluation Standards - The Lmgame benchmark includes various games, with specific evaluation metrics for each, such as total distance moved in Super Mario Bros and total candy cleared in Candy Crush [6][24]. - The evaluation does not consider time as a factor, focusing instead on game-specific performance metrics [22]. - The benchmark is open-source, allowing others to download and test their models [23]. Group 3: Development and Support - The project is developed by the Hao AI Lab at UCSD, which has received support from Google and NVIDIA [28][34]. - The lab has created multiple open-source projects, with FastVideo being the most starred on GitHub [32].