Workflow
智能体系统
icon
Search documents
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
3 6 Ke· 2025-10-24 12:45
斯坦福等新框架,用在线强化学习让智能体系统"以小搏大",领先GPT-4o—— AgentFlow,是一种能够在线优化智能体系统的新范式,可以持续提升智能体系统对于复杂问题的推理能力。 它由规划器、执行器、验证器、生成器四个专业智能体组成的团队通过共享内存进行协作,利用新方法Flow-GRPO,在系统内部直接对其规划器智能体 进行实时优化。 以Qwen-2.5-7B-Instruct为基座模型的AgentFlow在10个基准测试中表现突出: 搜索任务提升14.9%、智能体任务提升14.0%、数学任务提升14.5%、科学任务提升4.1%。 多项任务表现甚至超越比其大50倍的模型,超越GPT-4o、Llama3.1-405B。 | | | | | Search Intensive | | | | | | Math Reasoning | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | Size | Bamboogle | 2Wiki | HotpotOA | Musique | A | S ...
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
量子位· 2025-10-24 03:53
Core Insights - The article discusses the introduction of AgentFlow, a new paradigm in online reinforcement learning that enhances the reasoning capabilities of intelligent systems, outperforming models like GPT-4o and Llama3.1-405B [1][4][23]. Group 1: AgentFlow Overview - AgentFlow consists of a team of specialized agents including a planner, executor, verifier, and generator, which collaborate through shared memory to optimize decision-making in real-time [1][14][18]. - The Flow-GRPO method allows for on-policy optimization of the planner agent, enabling adaptive decision-making based on environmental changes and feedback from other agents [19][16]. Group 2: Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning, 14.5% in math reasoning, and 4.1% in scientific reasoning [3][25][27]. - The model's performance surpasses that of larger models, demonstrating that effective system design and training methods can be more impactful than simply increasing model size [27]. Group 3: Learning Mechanisms - The article emphasizes the importance of "learning in the flow," indicating that online learning in real interactive environments is crucial for achieving efficient reasoning [28][29]. - AgentFlow's architecture allows for rapid error correction and improved task planning through real-time training, enhancing overall system performance [30][29]. Group 4: Innovations and Findings - The system autonomously discovers new solution paths, such as combining different search tools to enhance information retrieval, showcasing its ability to adapt and innovate [33]. - AgentFlow maintains performance improvements without significantly increasing the average reasoning steps, indicating efficient handling of complex tasks [35]. Group 5: Future Implications - The article concludes that AgentFlow presents a novel approach to intelligent agent training, advocating for systems that adapt and learn continuously rather than relying on a single comprehensive model [37][38]. - Despite the distance from research to practical application, the potential for Agentic AI remains significant, suggesting a promising future for intelligent systems [39].
微软研究院杨玉庆:Agent 的注意力系统|Attention
3 6 Ke· 2025-09-05 03:42
Core Insights - The article discusses TriangleMix, a structural optimization method for attention mechanisms in large models, which addresses the computational bottleneck during the prefill stage while maintaining performance and accuracy [2][5][10] - TriangleMix allows for a hierarchical sparse attention architecture that significantly reduces latency and memory consumption, making it suitable for long-context tasks [8][10][36] Technical Overview - TriangleMix employs a layered attention strategy, using standard dense attention in the first 16 layers and switching to a triangle-shaped mask in the subsequent layers, which reduces computational complexity from O(N²) to O(N) [5][6] - The method has been tested on models like Llama-3.1-8B-Instruct, showing a kernel latency reduction from 750ms to 49ms, achieving a speedup of 15.3x and a decrease in time to first token (TTFT) by 12%-32% [10][9] Performance Metrics - Experimental results indicate that TriangleMix retains 99.7% of the original performance while applying the triangle attention in the majority of the deep layers [8][10] - The method demonstrates significant reductions in latency and memory usage with almost no loss in accuracy across various benchmark tasks [10][9] Broader Implications - The research emphasizes the importance of viewing attention mechanisms within the larger context of agent systems, training mechanisms, and task structures, rather than as isolated components [12][26] - The ongoing work at Microsoft Research focuses on optimizing agent-native systems, which aim to enhance the efficiency and effectiveness of AI applications, particularly for users with specific needs [15][67]
OpenAI女CEO太狠了,智商148,GPT-5才是真印钞机
3 6 Ke· 2025-08-14 03:11
Core Insights - GPT-5 is positioned as a significant advancement in AI technology, achieving an IQ of 148 and surpassing human genius levels, particularly excelling in mathematics and programming tests [3][5][13] - OpenAI's focus with GPT-5 is not just on intelligence but on monetization strategies, particularly targeting the vast number of free users to convert them into revenue-generating customers [15][16][17] Group 1: Performance and Recognition - GPT-5 has demonstrated exceptional performance in various benchmark tests, including setting new records in mathematics and showing notable improvements in programming tests [5][13] - The model's capabilities have received recognition from Nvidia, indicating its potential in reasoning and programming applications [13] Group 2: Monetization Strategy - OpenAI aims to monetize GPT-5 by leveraging its "router" technology, which can dynamically allocate resources based on user intent and query complexity, thus optimizing operational costs and enhancing performance [20][24][26] - The router system allows for a significant increase in user engagement, with daily active users of the reasoning model surging sevenfold among free users and nearly 3.5 times among paid users [26] Group 3: User Engagement and Growth - ChatGPT's user base has rapidly expanded, now surpassing major platforms like Twitter, Reddit, and WhatsApp, and is approaching the likes of Instagram and Facebook [19] - The growth in user engagement is attributed to the router's ability to provide tailored responses, enhancing the overall user experience and increasing the likelihood of monetization through indirect payments [17][19] Group 4: Future Commercialization Potential - OpenAI's strategic direction includes integrating advertising and affiliate models into the ChatGPT experience, allowing the platform to generate revenue without compromising user experience [34][36] - The router's capability to assess the commercial value of queries positions ChatGPT to evolve into a "super app," facilitating transactions and generating revenue through commissions on sales [35][51][58]
周鸿祎:不会再拍短剧,气质实在不符
Zheng Quan Shi Bao· 2025-08-06 10:05
Group 1 - The core viewpoint of the article is that Zhou Hongyi, the founder of 360, has decided not to produce short dramas anymore, stating that they do not align with his temperament [2][7] - Zhou Hongyi's first short drama, "Reigniting the Life of a Hidden Hacker," aired at the end of 2024 and sparked significant discussion due to its unique blend of a love story and an AI entrepreneurship narrative [4] - The short drama features a storyline where a wealthy father's tech company is intertwined with his son's romantic interest in a cleaning lady, who ultimately aids in the development of an AI product [4] Group 2 - Zhou Hongyi previously clarified that his interest in short dramas was business-related, not personal, after being misinterpreted by the media [5] - Following the announcement of his short drama, the National Radio and Television Administration required stricter management of "wealthy boss" micro-dramas, leading to public reactions directed at Zhou Hongyi [6] - At the ISC AI2025 conference, Zhou Hongyi expressed a shift in focus towards collaboration on animated-style short dramas, highlighting advancements in their AI tool, Nano AI, which has recently upgraded to a Level 4 intelligent system [7]
360宣布纳米AI升级为“多智能体蜂群”,可一句话生成大片
Xin Lang Ke Ji· 2025-08-02 14:17
Core Insights - 360 Group has officially announced the rebranding of Nano AI to "Multi-Agent Swarm," marking its advancement to L4 level intelligent systems, which enables a shift from "individual operation" to "group collaboration" [1] - The evolution of intelligent agents has gone through three stages: L1 chat assistants, L2 low-code workflow agents, and L3 autonomous planning agents, with the new L4 level allowing for collaborative task execution among multiple agents [1] - The new swarm collaboration framework allows over 50,000 L3 reasoning agents to work together to complete complex tasks, such as producing a 10-minute movie, with the system capable of executing over 1,000 steps continuously for 2 hours [1] Application and Efficiency - Nano AI has launched over 10 types of multi-agent swarms, covering various scenarios including video production, content creation, industry research, e-commerce, and travel planning [2] - The platform has developed the first "one-sentence blockbuster" multi-agent swarm, which can complete tasks that previously took at least 2 hours in just 20 minutes, utilizing L1 to L3 agents for scriptwriting, storyboarding, visuals, audio, music, and editing [2]
OpenAI发布ChatGPT Agent:部分能力超越人类,但做电子表格仍不如人类
Di Yi Cai Jing· 2025-07-18 05:13
Core Insights - OpenAI has launched ChatGPT Agent, which integrates Operator and Deep Research capabilities, allowing it to perform complex multi-step tasks and interact with various tools [1][2][9] - Despite improvements, ChatGPT Agent scored 45.5% in spreadsheet editing tasks, significantly lower than the human score of 71.3% [6] Group 1: ChatGPT Agent Features - ChatGPT Agent can perform tasks such as checking calendars, analyzing competitors, and converting screenshots to editable formats [1] - The system combines capabilities of visual browsing, text processing, code execution, and API access [2] Group 2: Performance Metrics - In various benchmark tests, ChatGPT Agent achieved an accuracy of 41.6% in interdisciplinary expert tests, outperforming other models [3] - In data science tasks, ChatGPT demonstrated high accuracy with 89.9% in analysis and 85.5% in modeling [3] Group 3: Future Developments - OpenAI plans to continue iterating on the Agent, with a focus on releasing GPT-5, which is anticipated to enhance the foundational model's capabilities [9] - Developers expect the Agent to reach 90% accuracy in complex tool usage by the end of the year, indicating a move towards commercial viability [9]
OpenAI发布ChatGPT Agent
第一财经· 2025-07-18 00:10
Core Viewpoint - OpenAI has launched ChatGPT Agent, which integrates multiple capabilities into a unified intelligent system, combining website interaction, information integration, and deep conversational abilities [1] Group 1 - ChatGPT Agent features a multi-tool integration capability [1] - The system merges Operator's website interaction ability, Deep Research's information integration, and ChatGPT's deep dialogue capabilities [1]