Workflow
强化学习(RL)
icon
Search documents
90%被大模型吃掉,AI Agent的困局
投中网· 2025-07-25 08:33
Core Viewpoint - The article discusses the challenges faced by general-purpose AI agents, particularly in the context of market competition and user engagement, suggesting that many agents may be overshadowed by large models and specialized agents [4][6][12]. Group 1: Market Dynamics - General-purpose agents like Manus and Genspark are experiencing declining revenue and user engagement, indicating a lack of compelling applications that drive user loyalty and payment [6][20][23]. - Manus reported an annual recurring revenue (ARR) of $9.36 million in May, while Genspark reached $36 million ARR within 45 days of launch, showcasing the initial market potential [20]. - However, both products have seen significant drops in monthly recurring revenue (MRR) and user traffic, with Manus experiencing a 50% decline in MRR to $2.54 million in June [22][23]. Group 2: Competitive Landscape - The article highlights that general-purpose agents are struggling to compete with specialized agents that are tailored for specific tasks, leading to a loss of market share [15][17]. - The high subscription costs of general-purpose agents, combined with the increasing capabilities of foundational models, make them less attractive to users who can access similar functionalities at lower costs [12][28]. - Companies like Alibaba and ByteDance are focusing on developing their own agent platforms while promoting developer ecosystems, indicating a strategic shift towards enhancing their competitive edge [26][29]. Group 3: User Experience and Application - General-purpose agents have not yet identified "killer" applications that would encourage users to pay for their services, often focusing on tasks like PPT creation and report writing, which do not sufficiently engage users [24][32]. - The lack of integration with internal knowledge bases and business processes limits the effectiveness of general-purpose agents in enterprise settings, where accuracy and cost control are paramount [15][16]. - Current agents often struggle with complex tasks due to their reliance on multiple steps, leading to inconsistent output quality, which further diminishes user trust and engagement [33][34]. Group 4: Technological Innovations - Some developers are exploring innovations like reinforcement learning (RL) to enhance the capabilities of agents, aiming to transition from simple tools to more autonomous and adaptable systems [36][40]. - The article notes that advancements in model architecture, such as the introduction of linear attention mechanisms, are being leveraged to improve the performance of agents in handling large volumes of text [35][36]. - The potential for RL to significantly improve agent performance is highlighted, with recent tests showing substantial improvements in task handling capabilities [38][40].
90%被大模型吃掉,AI Agent的困局
3 6 Ke· 2025-07-18 10:48
Core Viewpoint - The general agent market is facing significant challenges, with companies like Manus experiencing declines in user engagement and revenue, indicating a lack of compelling use cases that drive sustained user loyalty and payment [2][9][11]. Group 1: Market Dynamics - Manus has relocated its headquarters to Singapore, laid off 80 employees, and abandoned its domestic version, reflecting a strategic shift rather than a failure in operations [2]. - The general agent market is being eroded by the overflow of model capabilities and competition from specialized agents, leading to a decline in revenue and user activity for general agents like Manus and Genspark [2][8]. - The market is witnessing a drop in monthly recurring revenue (MRR) for general agents, with Manus reporting a more than 50% decline in June [11]. Group 2: Product Performance - General agents have struggled to find killer applications that can attract and retain users, often being used for basic tasks like creating presentations or reports [2][9][11]. - The performance of general agents is hindered by their inability to match the precision of specialized agents in enterprise settings, leading to dissatisfaction among users [7][8]. - The pricing model of Manus, which relies on a points-based system, is seen as a barrier to user adoption compared to cheaper and more efficient model APIs [6][11]. Group 3: Technological Challenges - The rapid advancement of large models has made them increasingly agent-like, allowing users to directly utilize these models instead of relying on general agents [4][8]. - General agents often struggle with complex tasks due to their reliance on a step-by-step execution process, which can lead to errors and inconsistent output quality [16][19]. - Innovations in reinforcement learning (RL) are being explored to enhance the capabilities of agents, potentially allowing them to evolve from simple tools to more autonomous and adaptable systems [17][22]. Group 4: Competitive Landscape - The competitive landscape is shifting, with larger companies leveraging their resources to develop and promote their own agent products while also providing free services to attract users [12][13]. - The domestic market for general agents is becoming increasingly competitive, with major players like Baidu and ByteDance offering free testing and services, making it difficult for smaller companies to compete [12][13]. - The focus on deep research capabilities and multi-modal functionalities is becoming a common strategy among various agent developers to enhance their offerings [12][15].
思维链开创者Jason Wei最新文章:大模型将攻克哪些领域? | Jinqiu Select
锦秋集· 2025-07-16 07:58
Core Viewpoint - The rapid evolution of large models is transforming their capabilities into product functionalities, making it crucial for entrepreneurs to stay informed about advancements in model technology [1][2]. Group 1: Characteristics of Tasks AI Can Solve - Tasks that AI can quickly tackle share five characteristics: objective truth, rapid verification, scalable verification, low noise, and continuous reward [2][10]. - The concept of "verification asymmetry" indicates that some tasks are much easier to verify than to solve, which is becoming a key idea in AI [3][8]. Group 2: Examples of Verification Asymmetry - Examples illustrate that verifying solutions can be significantly easier than solving the tasks themselves, such as in Sudoku or website functionality checks [4][6]. - Some tasks have verification processes that are nearly symmetrical, while others may take longer to verify than to solve, highlighting the complexity of verification [6][7]. Group 3: Importance of Verification - The "verifier's law" states that the ease of training AI to solve a task correlates with the task's verifiability, suggesting that tasks that are both solvable and easily verifiable will be addressed by AI [8][9]. - The learning potential of neural networks is maximized when tasks meet the outlined verification characteristics, leading to faster iterations and advancements in the digital realm [12]. Group 4: Case Study - AlphaEvolve - Google’s AlphaEvolve exemplifies the effective use of verification asymmetry, allowing for ruthless optimization of problems that meet the verifier's law characteristics [13]. - The focus of AlphaEvolve is on solving specific problems rather than generalizing across unseen problems, which is a departure from traditional machine learning approaches [13]. Group 5: Future Implications - Understanding verification asymmetry suggests a future where measurable tasks will be solved more efficiently, leading to a jagged edge of intelligence where AI excels in verifiable tasks [14][15].
突发|思维链开山作者Jason Wei被曝加入Meta,机器之心独家证实:Slack没了
机器之心· 2025-07-16 02:22
Core Viewpoint - Meta continues to recruit top talent from OpenAI, with notable researchers Jason Wei and Hyung Won Chung reportedly leaving OpenAI to join Meta [1][2][4]. Group 1: Talent Acquisition - Jason Wei and Hyung Won Chung, both prominent researchers at OpenAI, are confirmed to be leaving for Meta, with their Slack accounts already deactivated [2][4]. - Jason Wei is recognized as a key author of the Chain of Thought (CoT) concept, which has significantly influenced the AI large model field [4][6]. - Hyung Won Chung has been a core contributor to OpenAI's projects, including the o1 model, and has a strong background in large language models [4][29]. Group 2: Contributions and Impact - Jason Wei's work includes leading early efforts in instruction tuning and contributing to research on the emergent capabilities of large models, with over 77,000 citations on Google Scholar [21][16]. - Hyung Won Chung has played a critical role in the development of major projects like PaLM and BLOOM during his time at Google, and later at OpenAI, where he contributed to the o1 series models [26][40]. - Both researchers have been influential in advancing the capabilities of AI systems, particularly in reasoning and information retrieval [38][40]. Group 3: Community Reaction - Following the news of their potential move to Meta, the online community has expressed excitement and congratulations towards Jason Wei, indicating a strong interest in their career transition [10][9].
斯坦福毕业,用RL做Agent,华人创业团队种子轮融资1200万美元
机器之心· 2025-07-09 00:50
Core Viewpoint - Pokee AI has officially launched its public testing version, marking a significant milestone in its development journey and attracting attention from investors and the industry [1][8]. Group 1: Company Development - The company Pokee.ai was founded in October 2022, focusing on developing an interactive, personalized, and efficient AI Agent [4][9]. - The company has recently completed a $12 million seed round of financing led by Point72 Ventures, indicating strong investor interest [8]. - The pace of development has been rapid, with the product moving from concept validation to public testing in just over seven months [9]. Group 2: Technology and Approach - Unlike mainstream AI Agents that primarily utilize Large Language Models (LLM), Pokee.ai is centered around Reinforcement Learning (RL), with LLM serving as a user interface layer [5][17]. - The architecture allows for a more dynamic decision-making process, where RL models can utilize a broader action space compared to traditional LLMs [17]. - The ultimate goal is to create an AI Agent that can operate without extensive human configuration, allowing users to simply provide prompts for task completion [14][15]. Group 3: Market Perception and Challenges - Initially, many investors were skeptical about the RL-based approach, viewing it as unrealistic; however, perceptions have shifted as the technology gains traction [7][11]. - The challenge of aligning user intent with AI responses is significant, as users may not always articulate their needs clearly, complicating the AI's ability to deliver accurate results [18][20]. - The industry is still in the early stages of developing effective AI Agents, with many foundational steps yet to be completed [21]. Group 4: Team and Operations - The core team has expanded from four to seven members, with plans for further growth, but the company aims to maintain a lean structure to enhance efficiency [26][27]. - The company operates entirely online, leveraging remote work practices that have become common in the tech industry, allowing for flexibility and high productivity [30].
大模型刷数学题竟有害?CMU评估20+模型指出训练陷阱
量子位· 2025-07-07 06:13
Core Viewpoint - The article discusses the relationship between mathematical reasoning capabilities of large language models (LLMs) and their ability to transfer these skills to other tasks, highlighting that models trained with reinforcement learning (RL) show better transferability compared to those trained with supervised fine-tuning (SFT) [4][11]. Group 1: Mathematical Reasoning and Transferability - Research indicates that only models trained with RL can effectively transfer mathematical reasoning skills to other tasks, while SFT models show limited or no transfer [4][11]. - A Transferability Index (TI) is introduced to quantify the extent to which improvements in mathematical reasoning can be applied to other reasoning and non-reasoning tasks [8][9]. - If TI is greater than 0, it indicates a positive transfer effect to other tasks; if less than 0, it indicates negative transfer [9]. Group 2: Experimental Findings - The study evaluated over 20 models across various tasks, including mathematical reasoning, other reasoning tasks (like medical reasoning), and non-reasoning tasks (like common-sense dialogue) [7]. - Results show that models fine-tuned with RL consistently achieve higher transferability metrics across reasoning and non-reasoning tasks, while SFT models often experience negative transfer in non-reasoning tasks [11]. Group 3: Model Representation and Performance - PCA analysis reveals that RL fine-tuned models exhibit minimal shifts in representation space, indicating they retain previously learned knowledge while enhancing performance in specific domains [15]. - RL models demonstrate lower KL divergence in reasoning and non-reasoning tasks compared to SFT models, suggesting more stable and precise representation updates [16][18]. - The findings suggest that RL is crucial for achieving transferable reasoning capabilities in LLMs, marking another victory for reinforcement learning in this context [19].
图像目标导航的核心究竟是什么?
具身智能之心· 2025-07-04 12:07
Research Background and Core Issues - Image goal navigation requires two key capabilities: core navigation skills and direction information calculation based on visual observation and target image comparison [2] - The research focuses on whether this task can be efficiently solved through end-to-end training of complete agents using reinforcement learning (RL) [2] Core Research Content and Methods - The study explores various architectural designs and their impact on task performance, emphasizing implicit correspondence computation between images [3][4] - Key architectures discussed include Late Fusion, ChannelCat, SpaceToDepth + ChannelCat, and Cross-attention [4] Main Findings - Early patch-level fusion methods (like ChannelCat and Cross-attention) are more critical than late fusion methods (Late Fusion) for supporting implicit correspondence computation [8] - The performance of different architectures varies significantly under different simulator settings, particularly the "Sliding" setting [8][10] Performance Metrics - The success rate (SR) and success path length (SPL) metrics are used to evaluate the performance of various models [7] - For example, when Sliding=True, ChannelCat (ResNet9) achieved an SR of 83.6%, while Late Fusion only reached 13.8% [8] Transferability of Abilities - Some learned capabilities can transfer to more realistic environments, especially when including the weights of the perception module [10] - Training with Sliding=True and then fine-tuning in a Sliding=False environment improved SR from 31.7% to 38.5% [10] Relationship Between Navigation and Relative Pose Estimation - A correlation exists between navigation performance and relative pose estimation accuracy, indicating the importance of direction information extraction in image goal navigation [12] Conclusion - Architectural designs that support early local fusion (like Cross-attention and ChannelCat) are crucial for implicit correspondence computation [15] - The simulator's Sliding setting significantly affects performance, but transferring perception module weights can help retain some capabilities in real-world scenarios [15] - Navigation performance is related to relative pose estimation ability, confirming the core role of direction information extraction in image goal navigation [15]
ToMAP:赋予大模型「读心术」,打造更聪明的AI说服者
机器之心· 2025-06-24 14:07
Core Viewpoint - The article introduces ToMAP, a new persuasion model that integrates Theory of Mind (ToM) mechanisms to enhance the persuasive capabilities of AI, addressing the limitations of current large language models in understanding opponents' perspectives and adapting strategies accordingly [4][19]. Summary by Sections Introduction to Persuasion - Persuasion is a complex communication process that influences beliefs, attitudes, and behaviors, and serves as a test for advanced large language models [2]. Limitations of Current Models - Top-tier large models can generate coherent persuasive text but lack mental perception, which hinders their ability to effectively persuade [3][4]. ToMAP Model Overview - ToMAP introduces two key mental modules: the Refutation Predictor and the Attitude Predictor, enabling AI to anticipate opposing viewpoints and assess the opponent's attitude dynamically [9][19]. Refutation Predictor - The Refutation Predictor simulates human-like anticipation of counterarguments, allowing the model to address concerns proactively. It can identify common objections, such as "cooking is troublesome" or "the taste is bad" in discussions about vegetarian recipes [9][10]. Attitude Predictor - The Attitude Predictor evaluates the opponent's stance towards counterarguments, determining whether they are firmly against, neutral, or persuaded. This module uses dialogue history and arguments to dynamically assess the opponent's attitude [9][11]. Training Methodology - ToMAP employs reinforcement learning (RL) to train the model through numerous dialogues, rewarding it based on a "persuasiveness score" that measures attitude changes before and after interactions [11][19]. Experimental Results - The model was tested across various datasets, showing that ToMAP significantly outperforms baseline models and even larger models like GPT-4o, demonstrating its effectiveness despite having fewer parameters [14][20]. Performance Insights - ToMAP maintains a low level of repetition while increasing the diversity of outputs, indicating effective use of the mental modules. It also shows a higher depth of thought compared to baseline models, favoring rational strategies over emotional appeals [15][16]. Long-term Persuasiveness - Unlike baseline models that plateau or decline in effectiveness over extended dialogues, ToMAP continues to improve its persuasiveness, showcasing its adaptability and diverse argumentation [17][20]. Conclusion - ToMAP represents a significant advancement in AI persuasion frameworks, integrating social cognition features that allow for a more human-like understanding of opponents' cognitive structures and attitudes [20][21].
搜索智能体RAG落地不佳?UIUC开源s3,仅需2.4k样本,训练快效果好
机器之心· 2025-06-17 00:10
Core Insights - The article discusses the emergence of Agentic RAG (Retrieval-Augmented Generation) as a key method for large language models to access external knowledge, highlighting the limitations of current reinforcement learning (RL) training methods in achieving stable performance [1][8]. Group 1: Development of RAG Systems - The evolution of RAG systems is categorized into three stages: Classic RAG, Pre-RL-Zero Active RAG, and RL-Zero stage, with each stage introducing new methodologies to enhance retrieval and generation capabilities [7][8]. - The RL-based methods, while promising, face challenges such as misalignment of optimization goals with actual downstream tasks and the coupling of retrieval and generation processes, which complicates performance evaluation [9][12]. Group 2: Limitations of Current RL Methods - Current RL methods like Search-R1 and DeepRetrieval focus on Exact Match (EM) as a reward metric, which can lead to suboptimal training outcomes due to its strictness and insensitivity to semantic variations [9][10]. - The coupling of retrieval and generation in training can obscure the true performance improvements, making it difficult to discern whether gains are due to better search or enhanced language generation [11][12]. - Existing evaluation metrics fail to accurately measure the contribution of search quality to overall performance, leading to bottlenecks in assessment, training, and generalization [14]. Group 3: Introduction of s3 Framework - The s3 framework, proposed by UIUC and Amazon, aims to improve training efficiency and effectiveness by decoupling the search and generation processes, focusing solely on optimizing the searcher with a new reward function called Gain Beyond RAG (GBR) [1][17]. - s3 demonstrates significant efficiency, requiring only 2.4k training samples and achieving superior performance compared to larger baseline models, with a total training time of just 114 minutes [21][22][25]. Group 4: Experimental Results - In general QA tasks, s3 outperformed both Search-R1 and DeepRetrieval across multiple datasets, showcasing its strong generalization capabilities [23][25]. - In medical QA tasks, s3 exhibited remarkable cross-domain performance, indicating its robustness and adaptability to different datasets and contexts [26][27]. Group 5: Design and Optimization Insights - The design of s3 emphasizes the importance of starting retrieval from the original query, which helps maintain focus and improves search outcomes [31]. - The document selection mechanism within s3 significantly reduces token consumption, enhancing efficiency and minimizing noise in the generation process [31][30].
揭秘LLM“思考”之谜:推理即“梯度下降”,元学习框架解构训练过程,还给优化提供新思路
量子位· 2025-06-10 04:05
Core Insights - The article introduces the Reasoning as Meta-Learning (RaML) framework, which aims to reveal how large language models (LLMs) "think" by drawing parallels between reasoning and gradient descent optimization [1][2] - RaML posits that the reasoning trajectory generated by LLMs during problem-solving acts as a form of implicit parameter updates, leading to improved model performance [2][4] Group 1: RaML Framework and Mechanism - RaML's core insight is that the reasoning trajectory in LLMs resembles a "pseudo-gradient descent" process, where each reasoning step adjusts the model's internal state towards a better solution [2] - The framework decomposes the training process of LLMs into two levels: "inner-loop optimization" for specific tasks and "outer-loop optimization" for learning strategies across multiple tasks [8][9] - The study emphasizes that longer reasoning trajectories typically lead to better optimization outcomes, akin to more iterations in traditional optimization algorithms [14] Group 2: Empirical Validation and Performance - The QwQ-32B model's reasoning on the AIME24 dataset demonstrated that confidence in correct answers increases with the decoding of reasoning trajectories, supporting the idea of parameter updates through reasoning [3][4] - The comparison between supervised fine-tuning (SFT) and reinforcement learning (RL) models showed that SFT models outperform RL models in mathematical benchmarks, highlighting the benefits of guided learning [10][12] Group 3: Reflection Tokens and Optimization - The article discusses the role of "reflection" tokens in reasoning trajectories, which help the model reassess its outputs and improve performance by escaping local optima [15][17] - It contrasts "thinking" and "non-thinking" modes, indicating that forced early termination of reasoning can lead to suboptimal solutions, similar to premature stopping in gradient descent [18][20] Group 4: Generalization and Meta-Learning - The research indicates that LLMs trained on specific reasoning tasks can generalize to unseen tasks, leveraging learned universal features from various problems [21][23] - The RaML framework provides practical strategies for enhancing training performance by increasing the number of reasoning trajectories for each problem, akin to expanding the support set in meta-learning [25] Group 5: Future Directions and Efficiency - The article suggests exploring methods to extract shorter, equivalent optimization trajectories from longer reasoning paths to reduce decoding overhead while maintaining performance [27][30] - Initial experiments show that summarizing long reasoning trajectories can yield comparable results with significantly reduced computational costs, indicating a potential area for future research [30][31] Conclusion - The RaML framework offers a novel perspective on understanding LLM reasoning and training, revealing the intricate connections between reasoning, meta-learning, and gradient descent [32]