多智能体协作

Search documents
最新智能体自动操作手机电脑,10个榜单开源SOTA全拿下|通义实验室
量子位· 2025-08-25 23:05
Mobile-Agen团队 投稿 量子位 | 公众号 QbitAI 能自动操作手机、电脑的智能体新SOTA来了。 通义实验室 推出 Mobile-Agent-v3 智能体框架,在手机端和电脑端的多个核心榜单上均取得开源最佳。 它不仅能做交互界面的问答、描述、定位,也能一条指令独立完成复杂任务,甚至可以在多智能体框架中无缝扮演不同角色。 PC+Web演示:在Edge浏览器中搜索阿里巴巴的股价。然后在WPS中创建一个新表格,在第一列填写公司名称,在第二列填写股价。 PC演示: 创建一个新的空白演示文稿,然后在第一张幻灯片中以艺术字的形式插入一段文本,内容为"阿里巴巴"。 它既能独当一面,在 AndroidWorld、OSWorld、ScreenSpot等10个主流GUI榜单 中均取得了开源SOTA的水平;也能承担对话、问答、 定位、界面描述等基础任务。 Web演示: 去哔哩哔哩看雷军的视频,然后给第一个视频点赞。 手机演示: 请帮我在小红书上搜索济南旅游攻略,按收藏数排序,并保存第一条笔记。 请帮我在携程上查询济南大明湖风景区的详细信息,包括地址、票价等。 自动化操作手机、电脑成为了各家多模态大模型攻坚的主战场。 ...
“专家团”齐上阵,全球首个全端通用智能体发布
Bei Jing Ri Bao Ke Hu Duan· 2025-08-19 00:45
Core Insights - The article discusses the launch of GenFlow2.0 by Baidu Wenku and Baidu Wangpan, which is the world's first all-end universal intelligent agent capable of completing multiple complex tasks simultaneously [1][2] - GenFlow2.0 can operate over 100 expert intelligent agents at once, completing more than five complex tasks in just three minutes, with the ability for users to intervene and track memory throughout the process [1][2] Group 1 - GenFlow2.0 addresses issues from its predecessor, GenFlow1.0, such as difficulty in agent description, long wait times, poor delivery, and lack of editability [1] - The system can autonomously understand user intent and switch between different collaboration modes, allowing for real-time intervention and modifications based on user needs [1][2] Group 2 - GenFlow2.0 enhances personalization by recording and utilizing user history, including communication records and file uploads, to provide tailored content results [2] - The multi-agent collaboration trend is becoming a competitive focus among major tech companies, with challenges in task allocation, parameter transfer, and context management being critical for effective teamwork [2]
最新Agent框架,读这一篇就够了
自动驾驶之心· 2025-08-18 23:32
作者 | 哈喽WoW君 编辑 | 大模型之心Tech 原文链接: https://zhuanlan.zhihu.com/p/1939744959143086058 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 >> 点击进入→ 大模型没那么大Tech技术交流群 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 一、主流AI AGENT框架 当前主流的AI Agent框架种类繁多,各有侧重,适用于不同的应用场景。目前收集了几个主流并且典型Agent框架,先给出本文描述的有哪些框架图表。 | 框架 | 描述 | 适用场景 | | --- | --- | --- | | LangGraph | 基于LangChain搭建的状态驱动的多步骤 Agent | 复杂状态机、审批流 | | AutoGen | 多 Agent 协作、对话式 | 研究报告生成、任务拆解 | | CrewAI | 轻量级"角色扮演"多 Agent | 内容团队、市场分析 | | Smolagents | Hugging Fac ...
4个月,创建20万个应用,这是背后的产品|对话百度秒哒
量子位· 2025-08-09 07:01
Core Viewpoint - The article highlights the rapid success of Baidu's no-code application building platform, Miaoda, which has enabled users to create 200,000 applications in just four months without writing any code [1][39]. Group 1: No-Code Development - Miaoda allows users without programming experience to develop applications easily, emphasizing that creativity is the most important aspect of application development [11][30]. - The platform integrates various tools from the Baidu ecosystem, enabling users to utilize features like maps, voice functions, and SMS services seamlessly [8][19]. - Miaoda employs a dual interaction model, combining natural language interaction for initial creation and graphical user interface (GUI) for subsequent modifications, enhancing user experience [15][16]. Group 2: User Engagement and Creativity - The platform aims to democratize application development, allowing a broader range of individuals to participate and express their ideas [14][30]. - Users can draw inspiration from others' projects on the platform, facilitating a community-driven approach to creativity and application development [28][43]. - The success of applications created on Miaoda demonstrates that innovative ideas often come from non-programmers, highlighting the potential of diverse perspectives in application creation [30][34]. Group 3: Future Developments - Miaoda plans to expand its functionalities in phases, starting with tools for individual users and small businesses, eventually moving towards enterprise-level solutions [46][48]. - The platform is set to introduce features that allow users to view and adjust the underlying code, providing flexibility while maintaining a no-code approach [45]. - Continuous user feedback is integral to Miaoda's development, with plans to enhance community engagement through events and collaborative initiatives [52].
这群95后,要为30亿人重造上网入口
混沌学园· 2025-08-09 04:08
Core Viewpoint - The article discusses the emergence of Fellou, an "agentic browser" designed to automate tasks traditionally performed by users, thereby enhancing productivity and transforming the browsing experience [4][11][31]. Group 1: User Pain Points - Users are currently burdened with excessive manual tasks, such as opening an average of 40 websites and switching between 26 tabs daily, which consumes valuable time and cognitive resources [8][10]. - The traditional browser model requires users to sift through information lists and perform manual operations, leading to a sense of being "operational slaves" to the browser [9][11]. Group 2: Innovation and Development - Fellou aims to revolutionize the browser by integrating capabilities that allow it to act autonomously, transforming it from a mere information tool into a task executor [21][24]. - The development of Fellou draws inspiration from various fields, including RPA (Robotic Process Automation) and multi-agent collaboration, to create a user-friendly "browser-level RPA" [17][19]. Group 3: Performance and Efficiency - Fellou has demonstrated significant efficiency improvements, completing complex tasks in an average of 3.7 minutes, which is 3 to 5 times faster than similar AI products and drastically faster than manual processes [24]. - A notable case highlighted that a task that previously took three days to complete manually was accomplished by Fellou in just 7 minutes and 49 seconds, showcasing its effectiveness [24]. Group 4: Competitive Positioning - Fellou distinguishes itself from traditional browsers by providing structured knowledge summaries instead of mere links, effectively acting as an intelligent research assistant [28]. - Unlike other AI browsers that merely assist users, Fellou automates the entire task execution process, significantly reducing user effort and enhancing productivity [29][30]. Group 5: Future Implications - The article suggests that Fellou's innovative approach could redefine the browser landscape, challenging established players like Chrome and setting a new standard for user experience [32][33]. - The success of Fellou serves as a case study for future entrepreneurs, emphasizing the importance of identifying deep user pain points and rethinking value creation in product development [33].
喝点VC|BV百度风投:数据治理即生产力,现在是Data Agent的时刻
Z Potentials· 2025-07-30 03:37
Core Insights - The article emphasizes the transformative role of Data Agents in the era of Generative AI, highlighting their ability to compress the data lifecycle into a rapid "data → insight → action" loop, achieving over 60% efficiency gains and significant cost savings in the millions of dollars [3][4][10]. Industry Trends - Data Agents redefine "Data" as any digital asset that can be accessed and utilized in real-time, moving away from traditional static databases [5][7]. - The global data volume is projected to reach 149 ZB in 2024 and exceed 181 ZB in 2025, with approximately 80% being unstructured data that requires immediate structuring for algorithmic use [5][7]. - Generative AI is expected to contribute an additional $2.6 to $4.4 trillion in value annually, with nearly 75% of this value coming from functions heavily reliant on structured data [5][7]. Data Agent Definition and Functionality - Data Agents are AI entities that automate the entire data lifecycle, capable of planning, executing, and verifying tasks based on natural language inputs [7][8]. - They are positioned as core infrastructure rather than mere BI tools, directly impacting business KPIs and productivity [7][8]. Efficiency Gains and Market Acceptance - Early adopters of Data Agents have reported productivity increases of over 60% and annual savings of millions of dollars [7][8]. - The cost of LLM inference has dramatically decreased from $60 per million tokens to $0.06, indicating a significant technological shift [10][13]. - AI search and query traffic in the U.S. has reached 5.6%, reflecting a growing acceptance of natural language interactions for structured answers [13][14]. Market Demand and Investment Trends - The demand for Data Agents has surged, with a 900% increase in global search interest for "AI agent" and a tripling of investment in the AI Agent sector, reaching $3.8 billion in 2024 [45][46]. - Major acquisitions by companies like Databricks and Snowflake indicate a strong focus on data-driven AI platforms [13][14]. Development Stages of Data Agents - The evolution of Data Agents is expected to occur in three stages: 1. Human-led with AI empowerment, transforming data interaction and decision-making processes [36][37]. 2. Scenario-driven applications that allow for rapid development of customized systems based on existing data [38][40]. 3. Autonomous intelligence where Data Agents manage data collection, governance, and analysis, acting as a digital COO [41][42]. Conclusion and Future Outlook - The current landscape presents a unique opportunity for Data Agents to become the default interface for digital work, akin to the Office suite in the 1990s [45][46]. - The integration of Data Agents into business processes is anticipated to enhance organizational efficiency and responsiveness, marking a significant shift in how data is utilized across industries [48][49].
Multi-Agent 协作兴起,RAG 注定只是过渡方案?
机器之心· 2025-07-19 01:31
Group 1: Core Insights - The AI memory system is evolving from Retrieval-Augmented Generation (RAG) to a multi-level state dynamic evolution, enabling agents to retain experiences and manage memory dynamically [1][2]. - Various AI memory projects have emerged, transitioning from short-term responses to long-term interactions, thereby enhancing agents with "sustained experience" capabilities [2][3]. - MemoryOS introduces a hierarchical storage architecture that categorizes dialogue memory into short-term, medium-term, and long-term layers, facilitating dynamic migration and updates through FIFO and segmented paging mechanisms [2][3]. - MemGPT adopts an operating system approach, treating fixed-length context as "main memory" and utilizing paging to manage large document analysis and multi-turn conversations [2][3]. - Commercial platforms like ChatGPT Memory operate using RAG, retrieving user-relevant information through vector indexing to enhance memory of user preferences and historical data [2][3]. Group 2: Challenges Facing AI Memory - AI memory systems face several challenges, including static storage limitations, chaotic multi-modal and multi-agent collaboration, retrieval expansion conflicts, and weak privacy control [4][5]. - The need for hierarchical and state filtering mechanisms is critical, as well as the ability to manage enterprise-level multi-tasking and permissions effectively [4][5]. - These challenges not only test the flexibility of the technical architecture but also drive the evolution of memory systems towards being more intelligent, secure, and efficient [4][5].
AI Day直播 | LangCoop:自动驾驶首次以“人类语言”的范式思考
自动驾驶之心· 2025-07-18 10:32
Core Viewpoint - The article discusses the potential of multi-agent collaboration in autonomous driving, highlighting the introduction of LangCoop, a new paradigm that utilizes natural language for communication between agents, significantly reducing bandwidth requirements while maintaining competitive driving performance [3][4]. Group 1: Multi-Agent Collaboration - Multi-agent collaboration enhances information sharing among interconnected agents, improving safety, reliability, and maneuverability in autonomous driving systems [3]. - Current communication methods face limitations such as high bandwidth demands, heterogeneity of agents, and information loss [3]. Group 2: LangCoop Innovations - LangCoop introduces two key innovations for collaborative driving using natural language as a compact and expressive communication medium [3]. - Experiments conducted in the CARLA simulation environment demonstrate that LangCoop achieves up to a 96% reduction in communication bandwidth compared to image-based communication, with each message being less than 2KB [3]. Group 3: Additional Resources - The article provides links to the research paper titled "LangCoop: Collaborative Driving with Language" and additional resources for further exploration of the topic [4][5].
Google截胡Windsurf,布局AI编程
Haitong Securities International· 2025-07-16 04:31
Investment Rating - The report does not explicitly provide an investment rating for the industry or specific companies involved. Core Insights - The AI coding startup Windsurf, initially close to being acquired by OpenAI for $3 billion, opted to join Google DeepMind, focusing on agentic coding. Google executed a soft acquisition through non-exclusive technology licensing and talent absorption, with the deal valued at approximately $2.4 billion [1][2][8]. - Windsurf's core product, Agent IDE, is designed for multi-agent AI collaboration, highlighting the increasing importance of integrated development environments in AI programming [3][9]. - The competitive landscape has shifted, with platform risks escalating as independent AI tool providers face survival pressures. Windsurf's experience illustrates the dilemma of maintaining neutrality versus aligning with dominant platforms for resource support [4][10][11]. Summary by Sections Event - Windsurf was close to being acquired by OpenAI for $3 billion but chose to join Google DeepMind instead, focusing on agentic coding. Google did not acquire equity but engaged in a soft acquisition through technology licensing and talent absorption [1][2][8]. Commentary - The failed acquisition by OpenAI was primarily due to concerns over IP access rights granted to Microsoft, which raised fears within Windsurf's leadership about losing control over their core technology. This led to the collapse of the deal, allowing Google to seize the opportunity [2][8][10]. Product Overview - Windsurf's flagship product, Agent IDE, facilitates multi-agent AI collaboration, supporting task delegation, shared context, and persistent state management among AI agents [3][14]. Industry Implications - The situation faced by Windsurf reflects a broader trend in the AI industry where independent toolmakers must decide between maintaining platform neutrality or aligning with larger ecosystems for better resource access. This consolidation may accelerate standardization and innovation in AI development [11][12].
走进“大国重器”心脏!IRCTC 2025重磅参观——齐二机床产线深度开放日
机器人圈· 2025-07-14 13:51
Core Viewpoint - The article emphasizes the importance of integrating intelligent robotics technology with high-end equipment manufacturing, highlighting an upcoming event aimed at fostering collaboration between academia, industry, and research institutions [1]. Group 1: Company Overview - Qiqihar Second Machine Tool (Group) Co., Ltd. is a key enterprise in China's machinery industry, established during the "First Five-Year Plan" period, and has developed into a renowned production base for heavy machine tools and forging equipment [2]. - The company has produced over 60,000 various machine tools since its inception, including more than 1,000 heavy cutting machine tools and forging machinery, filling over 100 national gaps and providing critical equipment for foundational industries and national defense [2]. Group 2: Product and Technology Focus - The company specializes in heavy and super-heavy machine tools, with a product output rate of 80% in these categories, and ranks first in the domestic industry for heavy cutting machine tool output [3]. - Key products include CNC floor milling and boring machines, CNC gantry milling and boring machines, CNC vertical lathes, and large CNC special machines, showcasing advanced engineering applications of robotics and multi-agent collaboration [3][5]. Group 3: Event Details - The event on July 24, 2025, will include a visit to a national-level intelligent manufacturing demonstration workshop, showcasing the assembly line of the TK6963 ultra-heavy CNC milling and boring machine, which has a lifting capacity of 200 tons [4]. - The technical team will present three major collaboration directions: high-precision robotic operations for heavy machine assembly, development of multi-modal online detection systems for large workpieces, and domestic substitution solutions for core components of high-end equipment [6]. Group 4: Registration Information - Registration fees for the conference vary by participant type, with students paying 1,500 yuan, ordinary participants 2,800 yuan, corporate representatives 3,800 yuan, and members of the "Robot Technology and Application" council 2,100 yuan [7]. - Participants must register by July 21, 2025, and the registration fee includes meals and conference materials, while accommodation and transportation costs are to be borne by the participants [10].