Workflow
强化学习(RL)
icon
Search documents
都说强化+VLA才是未来?相关工作汇总来啦
具身智能之心· 2025-08-01 00:03
Core Viewpoint - The integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) presents a promising new paradigm that leverages both environmental trial-and-error interactions and pre-collected suboptimal data for enhanced performance [2]. Group 1: Offline RL Training without Environment - The paper "MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models" discusses scalability in RL applications [3]. - "Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions" focuses on offline RL techniques [3]. Group 2: Online RL Training with Environment - Online RL training enhances VLA models through trial-and-error interactions in real-time environments, leading to performance improvements [4]. - The paper "ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning" explores this concept [5]. - "GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot" presents a generalist approach in robotic models [5]. Group 3: Simulator-Based Approaches - Various projects aim to improve VLA models using simulation environments, such as "OctoNav: Towards Generalist Embodied Navigation" [6]. - "TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization" focuses on optimizing VLA models through trajectory-based methods [6]. - "VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning" emphasizes scalable RL for robotic manipulation [6]. Group 4: Real-World Applications - The deployment phase of RL training is crucial for testing VLA models in real-world scenarios [8]. - "Dynamism v1 (DYNA-1) Model: A Breakthrough in Performance and Production-Ready Embodied AI" highlights advancements in embodied AI [9]. - "ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy" discusses fine-tuning methods for VLA models [9]. Group 5: RL Alignment Training - "GRAPE: Generalizing Robot Policy via Preference Alignment" addresses the alignment of robot policies with user preferences [11]. - "SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning" focuses on safety in VLA model training [12].
从“炫技”转向“干活”,轮子比双足更吃香......高盛总结了WAIC人形机器人最新趋势
硬AI· 2025-07-28 15:03
Core Insights - The 2025 WAIC indicates a shift towards practical commercialization in the robotics industry, with wheeled robots becoming mainstream due to their ease of deployment and cost-effectiveness [2][3][4] - Despite advancements in mobility, fine manipulation remains a significant challenge, hindering robots from fully replacing human labor [8][9] - The cost of robots is decreasing, but a clear turning point has not yet been reached, necessitating more definitive industry signals for sustainable stock performance [3][11] Group 1: Commercialization Trends - The WAIC showcased over 60 robotic products, a significant increase from 25 static prototypes last year, indicating a move towards real-world applications in various sectors [2][4] - The design trend is shifting towards wheeled bases (AGV-style) rather than complex bipedal designs, prioritizing immediate commercial viability over technical sophistication [4][5] - The event's scale and participation have grown, with a 35% increase in venue size and a 60% rise in exhibitors, reflecting heightened investment and government support in the robotics sector [4] Group 2: Application Scenarios - Robots are increasingly tailored for specific applications, moving away from a one-size-fits-all approach [6] - In industrial settings, robots are being developed for tasks such as power line inspections and quality control in harsh environments [6] - In consumer and service industries, robots are being designed for practical tasks like making ice cream, organizing rooms, and providing retail assistance [6][7] Group 3: Technical Challenges - Despite improvements in autonomous navigation and dynamic movement, the precision of robotic manipulation remains a critical limitation [9] - Demonstrations at WAIC revealed frequent operational failures, with task completion times significantly lagging behind human capabilities [9] - The integration of visual-language-action (VLA) models and reinforcement learning (RL) is seen as essential for enhancing robotic performance and achieving commercial success [9] Group 4: Cost and Data Considerations - The introduction of entry-level robots priced at 40,000 RMB marks a notable development, yet mainstream robots still range from 400,000 to 500,000 RMB [11] - High-quality real-world interaction data is crucial for training models, but the cost of data collection is high, leading companies to adopt mixed strategies for data sourcing [11]
从“炫技”转向“干活”,轮子比双足更吃香......高盛总结了WAIC人形机器人最新趋势
Hua Er Jie Jian Wen· 2025-07-28 10:02
Core Insights - The 2025 World Artificial Intelligence Conference (WAIC) signals a shift towards practical commercialization in the humanoid robotics sector, moving away from mere technological showcases to robots that can perform real tasks [1][2] - The trend of adopting wheeled chassis robots, similar to Automated Guided Vehicles (AGVs), is becoming mainstream, indicating a focus on immediate commercial viability rather than complex bipedal designs [2][3] - The number of showcased robotic products has significantly increased, with over 60 products presented compared to 25 static prototypes last year, demonstrating substantial progress in product development [2][4] Industry Trends - The WAIC has expanded in scale, with a 35% increase in venue size to 70,000 square meters and a 60% rise in the number of exhibitors to 800, reflecting enhanced industry investment and government support [2] - The design of working prototypes is increasingly converging towards wheeled solutions, which offer advantages in stability, cost, and energy consumption, making them easier to deploy in flat environments like factories and warehouses [2][3] Application Scenarios - Robots are now being designed for specific applications rather than being generic solutions, with a focus on addressing particular industry challenges [4] - Various sectors are seeing the introduction of robots, including manufacturing, logistics, retail, and healthcare, with specific examples of robots designed for tasks such as power inspection and patient interaction [6] Technical Challenges - Despite advancements in autonomous navigation and dynamic movement, the precision of robotic manipulation remains a significant challenge, with task success rates and operational speeds still lagging behind human capabilities [5] - The integration of visual-language-action (VLA) models and reinforcement learning (RL) is seen as crucial for enhancing the general capabilities and task performance of robots [5] Cost and Data Considerations - The cost of entry-level robots has decreased, with a new model priced at 40,000 RMB, but the overall cost reduction in the industry is not yet significant, with full-sized robots still priced between 400,000 to 500,000 RMB [6] - High-quality real-world interaction data is essential for training VLA models, but collecting such data is expensive, leading companies to adopt mixed strategies that combine real and synthetic data for model training [6]
90%被大模型吃掉,AI Agent的困局
投中网· 2025-07-25 08:33
Core Viewpoint - The article discusses the challenges faced by general-purpose AI agents, particularly in the context of market competition and user engagement, suggesting that many agents may be overshadowed by large models and specialized agents [4][6][12]. Group 1: Market Dynamics - General-purpose agents like Manus and Genspark are experiencing declining revenue and user engagement, indicating a lack of compelling applications that drive user loyalty and payment [6][20][23]. - Manus reported an annual recurring revenue (ARR) of $9.36 million in May, while Genspark reached $36 million ARR within 45 days of launch, showcasing the initial market potential [20]. - However, both products have seen significant drops in monthly recurring revenue (MRR) and user traffic, with Manus experiencing a 50% decline in MRR to $2.54 million in June [22][23]. Group 2: Competitive Landscape - The article highlights that general-purpose agents are struggling to compete with specialized agents that are tailored for specific tasks, leading to a loss of market share [15][17]. - The high subscription costs of general-purpose agents, combined with the increasing capabilities of foundational models, make them less attractive to users who can access similar functionalities at lower costs [12][28]. - Companies like Alibaba and ByteDance are focusing on developing their own agent platforms while promoting developer ecosystems, indicating a strategic shift towards enhancing their competitive edge [26][29]. Group 3: User Experience and Application - General-purpose agents have not yet identified "killer" applications that would encourage users to pay for their services, often focusing on tasks like PPT creation and report writing, which do not sufficiently engage users [24][32]. - The lack of integration with internal knowledge bases and business processes limits the effectiveness of general-purpose agents in enterprise settings, where accuracy and cost control are paramount [15][16]. - Current agents often struggle with complex tasks due to their reliance on multiple steps, leading to inconsistent output quality, which further diminishes user trust and engagement [33][34]. Group 4: Technological Innovations - Some developers are exploring innovations like reinforcement learning (RL) to enhance the capabilities of agents, aiming to transition from simple tools to more autonomous and adaptable systems [36][40]. - The article notes that advancements in model architecture, such as the introduction of linear attention mechanisms, are being leveraged to improve the performance of agents in handling large volumes of text [35][36]. - The potential for RL to significantly improve agent performance is highlighted, with recent tests showing substantial improvements in task handling capabilities [38][40].
90%被大模型吃掉,AI Agent的困局
3 6 Ke· 2025-07-18 10:48
Core Viewpoint - The general agent market is facing significant challenges, with companies like Manus experiencing declines in user engagement and revenue, indicating a lack of compelling use cases that drive sustained user loyalty and payment [2][9][11]. Group 1: Market Dynamics - Manus has relocated its headquarters to Singapore, laid off 80 employees, and abandoned its domestic version, reflecting a strategic shift rather than a failure in operations [2]. - The general agent market is being eroded by the overflow of model capabilities and competition from specialized agents, leading to a decline in revenue and user activity for general agents like Manus and Genspark [2][8]. - The market is witnessing a drop in monthly recurring revenue (MRR) for general agents, with Manus reporting a more than 50% decline in June [11]. Group 2: Product Performance - General agents have struggled to find killer applications that can attract and retain users, often being used for basic tasks like creating presentations or reports [2][9][11]. - The performance of general agents is hindered by their inability to match the precision of specialized agents in enterprise settings, leading to dissatisfaction among users [7][8]. - The pricing model of Manus, which relies on a points-based system, is seen as a barrier to user adoption compared to cheaper and more efficient model APIs [6][11]. Group 3: Technological Challenges - The rapid advancement of large models has made them increasingly agent-like, allowing users to directly utilize these models instead of relying on general agents [4][8]. - General agents often struggle with complex tasks due to their reliance on a step-by-step execution process, which can lead to errors and inconsistent output quality [16][19]. - Innovations in reinforcement learning (RL) are being explored to enhance the capabilities of agents, potentially allowing them to evolve from simple tools to more autonomous and adaptable systems [17][22]. Group 4: Competitive Landscape - The competitive landscape is shifting, with larger companies leveraging their resources to develop and promote their own agent products while also providing free services to attract users [12][13]. - The domestic market for general agents is becoming increasingly competitive, with major players like Baidu and ByteDance offering free testing and services, making it difficult for smaller companies to compete [12][13]. - The focus on deep research capabilities and multi-modal functionalities is becoming a common strategy among various agent developers to enhance their offerings [12][15].
思维链开创者Jason Wei最新文章:大模型将攻克哪些领域? | Jinqiu Select
锦秋集· 2025-07-16 07:58
Core Viewpoint - The rapid evolution of large models is transforming their capabilities into product functionalities, making it crucial for entrepreneurs to stay informed about advancements in model technology [1][2]. Group 1: Characteristics of Tasks AI Can Solve - Tasks that AI can quickly tackle share five characteristics: objective truth, rapid verification, scalable verification, low noise, and continuous reward [2][10]. - The concept of "verification asymmetry" indicates that some tasks are much easier to verify than to solve, which is becoming a key idea in AI [3][8]. Group 2: Examples of Verification Asymmetry - Examples illustrate that verifying solutions can be significantly easier than solving the tasks themselves, such as in Sudoku or website functionality checks [4][6]. - Some tasks have verification processes that are nearly symmetrical, while others may take longer to verify than to solve, highlighting the complexity of verification [6][7]. Group 3: Importance of Verification - The "verifier's law" states that the ease of training AI to solve a task correlates with the task's verifiability, suggesting that tasks that are both solvable and easily verifiable will be addressed by AI [8][9]. - The learning potential of neural networks is maximized when tasks meet the outlined verification characteristics, leading to faster iterations and advancements in the digital realm [12]. Group 4: Case Study - AlphaEvolve - Google’s AlphaEvolve exemplifies the effective use of verification asymmetry, allowing for ruthless optimization of problems that meet the verifier's law characteristics [13]. - The focus of AlphaEvolve is on solving specific problems rather than generalizing across unseen problems, which is a departure from traditional machine learning approaches [13]. Group 5: Future Implications - Understanding verification asymmetry suggests a future where measurable tasks will be solved more efficiently, leading to a jagged edge of intelligence where AI excels in verifiable tasks [14][15].
突发|思维链开山作者Jason Wei被曝加入Meta,机器之心独家证实:Slack没了
机器之心· 2025-07-16 02:22
Core Viewpoint - Meta continues to recruit top talent from OpenAI, with notable researchers Jason Wei and Hyung Won Chung reportedly leaving OpenAI to join Meta [1][2][4]. Group 1: Talent Acquisition - Jason Wei and Hyung Won Chung, both prominent researchers at OpenAI, are confirmed to be leaving for Meta, with their Slack accounts already deactivated [2][4]. - Jason Wei is recognized as a key author of the Chain of Thought (CoT) concept, which has significantly influenced the AI large model field [4][6]. - Hyung Won Chung has been a core contributor to OpenAI's projects, including the o1 model, and has a strong background in large language models [4][29]. Group 2: Contributions and Impact - Jason Wei's work includes leading early efforts in instruction tuning and contributing to research on the emergent capabilities of large models, with over 77,000 citations on Google Scholar [21][16]. - Hyung Won Chung has played a critical role in the development of major projects like PaLM and BLOOM during his time at Google, and later at OpenAI, where he contributed to the o1 series models [26][40]. - Both researchers have been influential in advancing the capabilities of AI systems, particularly in reasoning and information retrieval [38][40]. Group 3: Community Reaction - Following the news of their potential move to Meta, the online community has expressed excitement and congratulations towards Jason Wei, indicating a strong interest in their career transition [10][9].
斯坦福毕业,用RL做Agent,华人创业团队种子轮融资1200万美元
机器之心· 2025-07-09 00:50
Core Viewpoint - Pokee AI has officially launched its public testing version, marking a significant milestone in its development journey and attracting attention from investors and the industry [1][8]. Group 1: Company Development - The company Pokee.ai was founded in October 2022, focusing on developing an interactive, personalized, and efficient AI Agent [4][9]. - The company has recently completed a $12 million seed round of financing led by Point72 Ventures, indicating strong investor interest [8]. - The pace of development has been rapid, with the product moving from concept validation to public testing in just over seven months [9]. Group 2: Technology and Approach - Unlike mainstream AI Agents that primarily utilize Large Language Models (LLM), Pokee.ai is centered around Reinforcement Learning (RL), with LLM serving as a user interface layer [5][17]. - The architecture allows for a more dynamic decision-making process, where RL models can utilize a broader action space compared to traditional LLMs [17]. - The ultimate goal is to create an AI Agent that can operate without extensive human configuration, allowing users to simply provide prompts for task completion [14][15]. Group 3: Market Perception and Challenges - Initially, many investors were skeptical about the RL-based approach, viewing it as unrealistic; however, perceptions have shifted as the technology gains traction [7][11]. - The challenge of aligning user intent with AI responses is significant, as users may not always articulate their needs clearly, complicating the AI's ability to deliver accurate results [18][20]. - The industry is still in the early stages of developing effective AI Agents, with many foundational steps yet to be completed [21]. Group 4: Team and Operations - The core team has expanded from four to seven members, with plans for further growth, but the company aims to maintain a lean structure to enhance efficiency [26][27]. - The company operates entirely online, leveraging remote work practices that have become common in the tech industry, allowing for flexibility and high productivity [30].
大模型刷数学题竟有害?CMU评估20+模型指出训练陷阱
量子位· 2025-07-07 06:13
Core Viewpoint - The article discusses the relationship between mathematical reasoning capabilities of large language models (LLMs) and their ability to transfer these skills to other tasks, highlighting that models trained with reinforcement learning (RL) show better transferability compared to those trained with supervised fine-tuning (SFT) [4][11]. Group 1: Mathematical Reasoning and Transferability - Research indicates that only models trained with RL can effectively transfer mathematical reasoning skills to other tasks, while SFT models show limited or no transfer [4][11]. - A Transferability Index (TI) is introduced to quantify the extent to which improvements in mathematical reasoning can be applied to other reasoning and non-reasoning tasks [8][9]. - If TI is greater than 0, it indicates a positive transfer effect to other tasks; if less than 0, it indicates negative transfer [9]. Group 2: Experimental Findings - The study evaluated over 20 models across various tasks, including mathematical reasoning, other reasoning tasks (like medical reasoning), and non-reasoning tasks (like common-sense dialogue) [7]. - Results show that models fine-tuned with RL consistently achieve higher transferability metrics across reasoning and non-reasoning tasks, while SFT models often experience negative transfer in non-reasoning tasks [11]. Group 3: Model Representation and Performance - PCA analysis reveals that RL fine-tuned models exhibit minimal shifts in representation space, indicating they retain previously learned knowledge while enhancing performance in specific domains [15]. - RL models demonstrate lower KL divergence in reasoning and non-reasoning tasks compared to SFT models, suggesting more stable and precise representation updates [16][18]. - The findings suggest that RL is crucial for achieving transferable reasoning capabilities in LLMs, marking another victory for reinforcement learning in this context [19].
图像目标导航的核心究竟是什么?
具身智能之心· 2025-07-04 12:07
Research Background and Core Issues - Image goal navigation requires two key capabilities: core navigation skills and direction information calculation based on visual observation and target image comparison [2] - The research focuses on whether this task can be efficiently solved through end-to-end training of complete agents using reinforcement learning (RL) [2] Core Research Content and Methods - The study explores various architectural designs and their impact on task performance, emphasizing implicit correspondence computation between images [3][4] - Key architectures discussed include Late Fusion, ChannelCat, SpaceToDepth + ChannelCat, and Cross-attention [4] Main Findings - Early patch-level fusion methods (like ChannelCat and Cross-attention) are more critical than late fusion methods (Late Fusion) for supporting implicit correspondence computation [8] - The performance of different architectures varies significantly under different simulator settings, particularly the "Sliding" setting [8][10] Performance Metrics - The success rate (SR) and success path length (SPL) metrics are used to evaluate the performance of various models [7] - For example, when Sliding=True, ChannelCat (ResNet9) achieved an SR of 83.6%, while Late Fusion only reached 13.8% [8] Transferability of Abilities - Some learned capabilities can transfer to more realistic environments, especially when including the weights of the perception module [10] - Training with Sliding=True and then fine-tuning in a Sliding=False environment improved SR from 31.7% to 38.5% [10] Relationship Between Navigation and Relative Pose Estimation - A correlation exists between navigation performance and relative pose estimation accuracy, indicating the importance of direction information extraction in image goal navigation [12] Conclusion - Architectural designs that support early local fusion (like Cross-attention and ChannelCat) are crucial for implicit correspondence computation [15] - The simulator's Sliding setting significantly affects performance, but transferring perception module weights can help retain some capabilities in real-world scenarios [15] - Navigation performance is related to relative pose estimation ability, confirming the core role of direction information extraction in image goal navigation [15]