机器之心
Search documents
从 ReasoningBank 到 MetaAgent,RL 未必是 Agent 自进化的必要解?
机器之心· 2025-10-25 02:30
Core Viewpoint - The article discusses the evolution of intelligent agents, emphasizing the importance of memory systems in enabling self-evolution beyond traditional reinforcement learning (RL) methods. It highlights the exploration of various technical directions, including metacognition and self-diagnosis, to enhance the capabilities of intelligent agents. Group 1: Memory Systems and Their Evolution - Recent advancements in artificial intelligence have shifted focus from solely large language models to self-evolving intelligent agents capable of executing complex tasks in dynamic environments [4] - The development of memory systems aims to transform immediate reasoning into cumulative, transferable long-term experiences, allowing agents to remember not just what to think but how to think [7][8] - The evolution of memory systems is categorized into three stages: No Memory Agent, Trajectory Memory, and Workflow Memory, each with its limitations regarding knowledge abstraction and adaptability [8][9] Group 2: ReasoningBank Mechanism - The ReasoningBank mechanism aims to elevate the abstraction level of agent memory from operational records to generalized reasoning strategies, enhancing knowledge readability and transferability across tasks [10] - It operates on a self-aware feedback loop that includes memory retrieval, construction, and integration, facilitating a closed-loop learning process without external supervision [7][10] - The Memory-aware Test-Time Scaling (MaTTS) mechanism optimizes resource allocation to enhance the quality of comparative signals, leading to improved reasoning strategies and faster adaptive evolution of agents [11][12] Group 3: Future Directions in Self-Evolution - While memory system improvements are currently the mainstream approach for enabling self-evolution in AI, researchers are also exploring other technical routes, such as self-recognition and external tool assistance [14]
2025谷歌博士生奖学金揭晓,清华、科大、南大等校友入选
机器之心· 2025-10-25 01:03
Core Insights - Google announced the recipients of the 2025 PhD Fellowship, aimed at recognizing and supporting outstanding graduate students in computer science and related fields, with a total funding of over $10 million [5]. Group 1: Fellowship Overview - The Google PhD Fellowship program was established in 2009 to support exceptional research in key foundational sciences [5]. - This year's recipients come from 35 countries and regions across 12 research areas, totaling 255 PhD students [5]. Group 2: Notable Recipients - In the Algorithms and Optimization category, 14 PhD students were awarded, including two Chinese recipients [7]. - In the Computer Architecture category, two PhD students received awards, one of whom is a Chinese recipient [15][16]. - The Human-Computer Interaction category saw 14 awardees, including two Chinese researchers [19]. - The Machine Learning and ML Foundations category had the highest number of recipients, with 38 awardees, including 10 Chinese students [27]. - The Natural Language Processing category included 18 awardees, with one Chinese recipient [81]. - The Privacy and Security category featured 16 awardees, including six Chinese researchers [86]. - The Quantum Computing category had eight awardees, with two being Chinese [103]. Group 3: Research Focus of Chinese Recipients - Tony Eight Lin, a PhD student at Taipei Medical University, focuses on drug discovery and molecular simulation [10]. - Yonggang Jiang from the Max Planck Institute in Germany specializes in algorithm design and analysis, particularly in graph algorithms [14]. - Zhewen Pan, a PhD student at the University of Wisconsin-Madison, has received multiple awards for her work in computer architecture [18]. - Qiwei Li from the University of Michigan focuses on critical technology issues related to gender and AI [23]. - Yichuan Zhang, a PhD student at the University of Tokyo, has presented on human-computer collaboration in time series prediction [26]. - Wei Xiong from the University of Illinois at Urbana-Champaign is researching reinforcement learning applications in large language models [47].
快手Klear团队提出CE-GPPO:通过梯度保留协调熵,解决强化学习中的熵不稳定问题
机器之心· 2025-10-25 01:03
Core Insights - The article discusses the development of a new reinforcement learning algorithm called CE-GPPO, which aims to balance exploration and exploitation in training large language models [3][11][21] - The Klear team from Kuaishou Technology has made significant advancements in AI, particularly in the area of language models, achieving state-of-the-art results in mathematical and coding benchmarks [2][21] Research Motivation - The core challenge in optimizing large models for complex reasoning tasks using reinforcement learning is balancing policy entropy, which represents the uncertainty in action selection [6][21] - Existing methods face instability issues due to entropy collapse and explosion, leading to either a lack of exploration or excessive exploration [6][21] Algorithm Design - CE-GPPO introduces a new approach to gradient clipping, allowing for the retention and scaling of gradients from low-probability tokens to maintain a balance between exploration and convergence [11][15] - The algorithm employs two adjustable hyperparameters, β₁ and β₂, to control the gradient weights of different token types, facilitating a flexible adjustment between exploration and exploitation [15][24] Experimental Results - CE-GPPO was tested on multiple mathematical reasoning benchmarks, showing superior performance compared to other methods, particularly in high-difficulty tasks [20][21] - The results indicate that larger model sizes benefit more from CE-GPPO, demonstrating its scalability potential [21][24] Comparison with Other Algorithms - CE-GPPO outperformed other recent reinforcement learning algorithms like CISPO and GSPO, showcasing its effectiveness in maintaining training stability and performance [35][36] - The method also demonstrated advantages over traditional entropy regularization techniques, maintaining a stable entropy curve throughout training [37]
智能体系统如何「边做边学」?斯坦福团队探索在线优化的新范式
机器之心· 2025-10-24 09:12
Core Insights - The article discusses the limitations of traditional methods for enabling intelligent agents to perform complex reasoning and tool usage, highlighting the need for a more scalable and adaptable approach [2][3][4] - The proposed AgentFlow framework integrates collaborative reasoning among multiple independent agent modules and introduces the Flow-GRPO algorithm for training, achieving significant performance improvements in various tasks [3][4][15] Group 1: Traditional Methods and Challenges - Traditional approaches to training language models for complex task reasoning either involve a single model handling both reasoning and tool usage or rely on static prompt-driven systems [11][14] - The first method struggles with stability and scalability in long-chain reasoning and dynamic environments, while the second lacks learning and adaptation capabilities [3][14] - The research team aimed to enable agent systems to learn and evolve through interaction, addressing the limitations of existing methods [14][15] Group 2: AgentFlow Framework - AgentFlow is a modular, tool-integrated intelligent agent system designed to overcome scalability and generalization limitations of current methods [15][27] - It features a planner that adapts in real-time during agent interactions, allowing for adaptive reasoning and robust tool-calling [15][19] - The framework demonstrates significant improvements in long-term planning, tool efficiency, and dynamic reasoning depth across various domains [4][15] Group 3: Flow-GRPO Algorithm - Flow-GRPO addresses the challenge of multi-turn credit assignment in reinforcement learning by broadcasting outcome rewards to each step, transforming complex multi-turn problems into manageable single-turn updates [19][20] - This method alleviates sparse reward issues and enhances training efficiency, providing a foundation for stable learning in complex reasoning tasks [20][27] Group 4: Experimental Results - AgentFlow was evaluated across ten benchmark tests, outperforming existing leading methods, including large proprietary models like GPT-4o [22][27] - Notable performance improvements include a 14.9% increase in knowledge retrieval, 14.0% in agentic reasoning, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [24][27] - The 7B parameter AgentFlow model surpassed the performance of 200B parameter models, demonstrating that effective system design can be more impactful than merely increasing model size [27][30] Group 5: Learning and Adaptation - The research indicates that online learning in real interaction environments is crucial for achieving efficient reasoning, as offline supervised training led to significant performance drops [27][30] - The system autonomously discovered new tool usage patterns, enhancing its ability to gather information through combined tool strategies [30][33] - AgentFlow's performance improves with increased reasoning steps without excessively extending average reasoning time, indicating effective task handling [33][35] Group 6: Conclusion and Future Potential - AgentFlow presents a novel approach to intelligent agent training, emphasizing continuous learning and adaptation over a single comprehensive model [36][37] - The work highlights the potential and imaginative possibilities within the field of agentic AI, despite the distance from research exploration to practical application [37]
视远·正心明智——机器之心2025年度AI榜单正式启动
机器之心· 2025-10-24 09:12
Core Insights - The article emphasizes the ongoing advancements in artificial intelligence (AI) as of 2025, highlighting the rapid iteration of large models and their transformative impact on various applications [2][3] - It notes that Chinese AI models are not only catching up to but also surpassing international standards, particularly in the open-source ecosystem [4][5] AI Development and Trends - The year 2025 has seen significant breakthroughs in large models, with new models and training methods emerging almost daily, enhancing capabilities in understanding, generation, and reasoning [3][4] - The advancements in AI are leading to new application forms, such as automated code generation and multi-step task completion in intelligent agents [4] Rankings and Evaluations - The article presents a curated list of top companies and models in the AI sector for 2025, focusing on those with strong technical capabilities and innovative research [6][7] - The "Top 10 Companies with Strong Technical Strength" are recognized for their long-term commitment to AI research and their leading technological reserves [7] - The "Top 20 AI Leading Companies" are acknowledged for their comprehensive operational capabilities and competitive advantages in AI technology development and application [8] - The "Top 20 Best Large Models" highlights representative and powerful foundational models in the domestic market [9] - The "Top 20 Best Large Model Products" focuses on valuable new products and applications based on large models that demonstrate the technology's value [10] - The "Top 10 Leading Companies in Embodied Intelligence" recognizes companies with systematic technological layouts and continuous innovation in this emerging field [11][12] - The "Top 10 Leading Companies in ScienceAI" identifies firms that integrate AI with other scientific disciplines to drive industry development [13]
Meta裁员后续:田渊栋被过河拆桥,姚顺雨等集体「抢人」
机器之心· 2025-10-24 06:26
Core Insights - Meta has laid off approximately 600 positions in its AI department, affecting teams such as FAIR and AI products, with significant implications for the company's internal structure and strategy [1][6][8] Group 1: Layoff Details - The layoffs included the team led by Tian Yuandong, which has raised questions about the reasons behind the cuts, including performance issues related to the Llama 3 and Llama 4 models [4][6] - Employees affected by the layoffs will receive 16 weeks of severance pay, plus additional compensation based on their tenure, with Tian Yuandong reportedly receiving eight months' salary [6][7] Group 2: Internal Dynamics - The layoffs reflect a chaotic internal research structure at Meta, where competition for resources between research teams and product-oriented teams has been a long-standing issue [6][18] - The restructuring is seen as a move to strengthen Alexandr Wang's position within Meta's AI strategy, as the company aims to streamline its operations [6][8] Group 3: Financial Context - Meta had previously raised its total expenditure forecast for 2025 to between $114 billion and $118 billion, indicating a significant increase in AI-related spending expected to continue into 2026 [7] Group 4: Industry Impact - The layoffs at Meta have sparked a talent acquisition race among tech companies, with many firms actively seeking to recruit displaced employees [12][16] - The situation highlights the competitive landscape in the AI sector, where companies are vying for top talent amid rapid advancements and changes in strategy [18][19]
Seedream 4.0大战Nano Banana、GPT-4o?EdiVal-Agent 终结图像编辑评测
机器之心· 2025-10-24 06:26
Core Insights - The article discusses the emergence of EdiVal-Agent, an automated, fine-grained evaluation framework for multi-turn image editing, which is becoming crucial for assessing multimodal models' understanding, generation, and reasoning capabilities [2][7]. Evaluation Methods - Current mainstream evaluation methods fall into two categories: 1. Reference-based evaluations rely on paired reference images, which have limited coverage and may inherit biases from older models [6]. 2. VLM-based evaluations use visual language models to score based on prompts, but they struggle with spatial understanding, detail sensitivity, and aesthetic judgment, leading to unreliable quality assessments [6]. EdiVal-Agent Overview - EdiVal-Agent is an object-centric automated evaluation agent that can recognize each object in an image, understand editing semantics, and dynamically track changes during multi-turn editing [8][17]. Workflow of EdiVal-Agent 1. **Object Recognition**: EdiVal-Agent first identifies all visible objects in an image and generates structured descriptions, creating an object pool for subsequent instruction generation and evaluation [17]. 2. **Instruction Generation**: It automatically generates multi-turn editing instructions covering nine editing types and six semantic categories, allowing for dynamic maintenance of object pools [18][19]. 3. **Automated Evaluation**: EdiVal-Agent evaluates model performance from three dimensions: instruction following, content consistency, and visual quality, with a final composite score (EdiVal-O) derived from geometric averages of the first two metrics [20][22]. Performance Metrics - EdiVal-IF measures how accurately models follow instructions, while EdiVal-CC assesses the consistency of unedited content. EdiVal-VQ, which evaluates visual quality, is not included in the final score due to its subjective nature [25][28]. Human Agreement Study - EdiVal-Agent's evaluation results show an average agreement rate of 81.3% with human judgments, significantly outperforming traditional methods [31][32]. Model Comparison - EdiVal-Agent compared 13 representative models, revealing that Seedream 4.0 excels in instruction following, while Nano Banana balances speed and quality effectively. GPT-Image-1 ranks third due to its focus on aesthetics at the expense of consistency [36][37].
死磕「文本智能」,多模态研究的下一个前沿
机器之心· 2025-10-24 06:26
Core Insights - The article discusses the increasing reliance on AI for medical diagnosis, particularly in cases where traditional methods have failed to provide answers, highlighting the potential of AI models like GPT-5 in understanding complex medical information [2][4]. - The concept of "multimodal text intelligence" is introduced as a critical area of research, aiming to enhance AI's ability to comprehend and integrate various forms of information, such as text, images, and reports, into a cohesive understanding [4][5]. Multimodal Text Intelligence - Multimodal text intelligence focuses on enabling AI to achieve a comprehensive understanding of information across different formats, moving beyond mere text recognition to a deeper semantic comprehension [7][11]. - The current limitations of AI in fully interpreting complex documents, such as PDFs, are emphasized, with estimates suggesting that there are around 10 billion such documents that AI struggles to analyze effectively [7][8]. - The forum discussed various challenges in achieving this understanding, including the need for advanced techniques in perception, cognition, and decision-making [11][12]. Perception and Recognition - The perception layer aims to enable AI to accurately identify and understand various elements within documents, such as text, images, and tables, while recognizing their spatial and semantic relationships [12][13]. - Challenges in this area include dealing with unclear text, complex layouts, and diverse languages, which can hinder recognition accuracy [13][15]. - Several advancements in intelligent document processing were presented, showcasing a comprehensive technical system that addresses these challenges [15][19]. Cognition and Reasoning - The cognitive layer's goal is to allow AI to think and reason about the multimodal information it perceives, moving from a language-based reasoning approach to a more visual and integrated thought process [41][42]. - Techniques such as multimodal reasoning chains are being developed to enhance AI's ability to engage in dynamic and interpretable reasoning processes [42][44]. - Research indicates that effective transmission of "visual thoughts" is crucial for enabling deeper reasoning capabilities in AI models [45]. Decision-Making and Action - The article highlights the importance of transitioning AI from passive understanding to active decision-making and action based on its reasoning [48][49]. - Examples of early implementations of this capability include AI systems that can autonomously assess image quality and make adjustments without user intervention [48]. - The exploration of decision-making capabilities in AI is still in its infancy, with significant work needed to develop more complex actions [49]. Path to AGI - The article posits that multimodal text intelligence could be a realistic pathway toward achieving Artificial General Intelligence (AGI), as it encompasses a comprehensive approach from perception to cognition and action [50][52]. - Current AI technologies often focus on isolated capabilities, but the integration of multimodal text intelligence is seen as essential for creating a complete feedback loop in AI systems [52].
比Qwen3-Max更Max?夸克抢先用上最新闭源模型
机器之心· 2025-10-24 04:32
Core Insights - The article discusses the launch of Quark's new dialogue assistant, which integrates AI search and conversation capabilities, utilizing the latest closed-source Qwen model, promising superior performance compared to previous models [2][4][26]. Group 1: Product Features - The Quark dialogue assistant employs the latest Qwen closed-source model, which is said to be an advancement over Qwen3-Max, a model that previously ranked among the top three globally [4][29]. - The assistant demonstrates strong reasoning and long-text comprehension abilities, allowing it to provide quick and accurate responses to user inquiries, especially in complex and multi-turn dialogues [6][26]. - Quark's extensive experience in search and tools, combined with its proprietary knowledge base, enables the assistant to perform real-time multi-channel searches, enhancing the accuracy and reliability of the information provided [6][32]. Group 2: Performance Evaluation - The assistant has been tested in various scenarios, showcasing its ability to efficiently find resources, analyze news events, and understand cultural references, indicating its robust comprehension and analytical skills [10][13][14]. - It can generate creative content, such as poetry and reviews, demonstrating its writing capabilities and understanding of stylistic nuances [16][18]. - The assistant's logic reasoning skills were evaluated through image-based problem-solving, where it successfully identified patterns and provided correct answers [20]. Group 3: Technical Architecture - Quark's dialogue assistant is built on a dual-driven strategy of "model + system," which enhances the accuracy and credibility of its responses through real-time information retrieval and source verification [32]. - The assistant is supported by specialized vertical knowledge bases in fields like healthcare, education, law, and finance, which improve its performance in specific applications [33][34]. - The Qwen model's pre-training data volume reaches 36 trillion tokens, with over a trillion parameters, showcasing its advanced capabilities in various domains, including mathematical reasoning and complex instruction understanding [29][30]. Group 4: Strategic Implications - The launch of the Quark dialogue assistant aligns with Alibaba's strategy of making AI applications more accessible and interactive for users, redefining how information is retrieved and tasks are managed [35][36]. - This integration of search, Q&A, and task processing within a single interface aims to streamline user experience, eliminating the need for multiple applications [35].
NeurIPS 2025 Spotlight | 让检索、推理真正「合体」的小而强模型,AceSearcher来了
机器之心· 2025-10-24 04:32
如何让一个并不巨大的开源大模型,在面对需要多步检索与复杂逻辑整合的问题时,依然像 "冷静的研究员" 那样先拆解、再查证、后归纳,最后给出可核实的结 论? 近期,来自埃默里大学,佐治亚理工大学,罗格斯大学,纽约州立大学奥尔巴尼分校,得克萨斯大学西南医学中心的研究团队发布 AceSearcher 模型,一个让同一 语言模型在推理时兼任 "问题分解者(Decomposer)" 与 "答案求解者(Solver)" 的合作式自博弈框架:它以两阶段训练(SFT→RFT)为骨架,把 "会拆题、会找 料、会整合" 的完整能力链拧成了一根绳。更重要的是,这不是单纯的 "又一个新模型",而是一个更优的框架:它把公开的推理数据集引入到检索增强的训练流程 中,让模型真正学会如何把推理与检索结合起来,显著提升了复杂检索任务的效果。 在三大类推理密集任务、十个数据集上,它拿到了平均 EM +7.6% 的优势;32B 版本在文档级金融推理上,表现可对标 685B 的 DeepSeek-V3,但参数量却不到 5%。 | Ran Xu1 | Yuchen Zhuang2 | Zihan Dong3 | Jonathan Wang1 | Yue ...