Workflow
强化学习
icon
Search documents
交互扩展时代来临:创智复旦字节重磅发布AgentGym-RL,昇腾加持,开创智能体训练新范式
机器之心· 2025-09-11 04:53
Core Insights - The article emphasizes the transition of artificial intelligence from a "data-intensive" to an "experience-intensive" era, where true intelligence is derived from active exploration and experience accumulation in real environments [10][11][50]. - The introduction of the AgentGym-RL framework represents a significant advancement in training autonomous LLM agents for multi-turn decision-making, addressing the limitations of existing models that rely on single-turn tasks and lack diverse interaction mechanisms [12][50]. Group 1: Framework and Methodology - AgentGym-RL is the first end-to-end framework for LLM agents that does not require supervised fine-tuning, supports interactive multi-turn training, and has been validated in various real-world scenarios [3][15]. - The framework integrates multiple environments and rich trajectory data, simplifying complex environment configurations into modular operations, thus facilitating effective experience-driven learning [13][19]. - The ScalingInter-RL method introduces a progressive interaction round expansion strategy, allowing agents to gradually adapt to environments and optimize their interaction patterns, balancing exploration and exploitation [4][23][25]. Group 2: Performance and Results - The research team achieved remarkable results with a 7B parameter model, which demonstrated complex task handling skills such as understanding task objectives and planning multi-step operations after extensive interaction training [5][29]. - In various testing environments, the model not only surpassed large open-source models over 100B in size but also matched the performance of top commercial models like OpenAI o3 and Google Gemini 2.5 Pro [5][29]. - The ScalingInter-RL model achieved an overall accuracy of 26.00% in web navigation tasks, significantly outperforming GPT-4o's 16.00% and matching the performance of DeepSeek-R1-0528 and Gemini-2.5-Pro [29][30]. Group 3: Future Directions - Future research will focus on upgrading general capabilities to enable agents to make efficient decisions in new environments and with unknown tools [51]. - The team aims to expand into more complex scenarios that closely resemble the physical world, such as robotic operations and real-world planning [52]. - There is an intention to explore multi-agent collaboration training models to unlock more complex group decision-making capabilities [52].
图灵奖得主理查德·萨顿:人工智能进入“经验时代”,潜力超以往
Bei Ke Cai Jing· 2025-09-11 04:47
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasized that the human data dividend is nearing its limit, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: AI and Learning - Sutton stated that most current machine learning aims to transfer existing human knowledge to static AI, which lacks autonomous learning capabilities. He believes we are reaching the limits of human data, and existing methods cannot generate new knowledge, making continuous learning essential for intelligence [2] - He defined "experience" as the interaction of observation, action, and reward, which is crucial for an intelligent agent's ability to predict and control its input signals. Experience is the core of all intelligence [2] Group 2: Collaboration and Future Predictions - Addressing fears about AI causing bias, unemployment, or even human extinction, Sutton argued that such fears are exaggerated and often fueled by those who profit from them. He highlighted that economic systems function best when individuals have different goals and abilities, similar to how decentralized collaboration among intelligent agents can lead to win-win outcomes [3] - Sutton proposed four predictive principles for the future of AI: 1. There is no consensus on how the world should operate, and no single view can dominate [3] 2. Humanity will truly understand intelligence and create it through technology [3] 3. Current human intelligence will soon be surpassed by superintelligent AI or enhanced humans [3] 4. Power and resources will flow to the most intelligent agents [3] Group 3: Historical Context and Future Outlook - Sutton categorized the history of the universe into four eras: the particle era, the star era, the replicator era, and the design era. He believes humanity's uniqueness lies in pushing design to its limits, which is the goal pursued through AI today [4] - He described AI as the inevitable next step in the evolution of the universe, urging society to embrace it with courage, pride, and a spirit of adventure [4] Group 4: Event Overview - The 2025 Inclusion Bund Conference, themed "Reshaping Innovative Growth," took place in Shanghai from September 10 to 13, featuring a main forum, over 40 open insight forums, global theme days, innovation stages, a technology exhibition, and various networking opportunities [4]
图灵奖得主理查德·萨顿2025外滩大会演讲:经验是一切智能的核心与基础
Yang Guang Wang· 2025-09-11 04:06
Core Insights - The 2025 Inclusion Bund Conference opened in Shanghai, featuring a keynote speech by Richard Sutton, the 2024 Turing Award winner and a pioneer in reinforcement learning [1] Group 1: Machine Learning and AI - Sutton emphasized that current machine learning primarily focuses on transferring existing human knowledge to static, non-autonomous AI, reaching the limits of human data [2] - He introduced the concept of the "experience era," advocating for new data sources generated through direct interaction between intelligent agents and the world [2] - Sutton defined "experience" as the interplay of observation, action, and reward, asserting that knowledge is derived from experience, which is fundamental to intelligence [2] Group 2: Future of AI - Sutton proposed four predictive principles regarding the future of AI: 1. There is no consensus on how the world operates, and no single perspective can dominate [3] 2. Humanity will truly understand intelligence and create it through technology [3] 3. Current human intelligence will soon be surpassed by superintelligent AI or enhanced humans [3] 4. Power and resources will gravitate towards the most intelligent agents [3] - He categorized the history of the universe into four eras: particle, star, replicator, and design, asserting that humanity's unique ability to push design to its limits is crucial in the pursuit of AI [3] Group 3: Embracing AI - Sutton stated that artificial intelligence is the inevitable next step in the evolution of the universe, and it should be embraced with courage, pride, and a spirit of adventure [4]
AI跨步进入“经验时代”
Hua Er Jie Jian Wen· 2025-09-11 03:50
Group 1 - The AI industry is transitioning into an "experience era," where continuous learning is essential for intelligence, moving beyond the limitations of human data [2] - Richard Sutton emphasizes that knowledge is derived from experience, which involves observation, action, and reward, and that the intelligence of an agent depends on its ability to predict and control input signals [2] - Two technologies, continual learning and meta-learning, are necessary to unlock the full potential of AI in this new experience era [2] Group 2 - Concerns about AI leading to bias, unemployment, or even human extinction are exaggerated and fueled by certain organizations and individuals profiting from such fears [3] - Sutton argues that decentralized collaboration among agents with different goals can lead to mutual benefits, highlighting human cooperation as a unique strength [3] - He presents four predictive principles regarding the future of AI, including the lack of consensus on how the world should operate and the potential for superintelligent AI to surpass human intelligence [3] Group 3 - Sutton categorizes the history of the universe into four eras: particle, star, replicator, and design, asserting that humanity's unique ability to push design to its limits is crucial in the current pursuit of AI [4] - He believes that AI is an inevitable next step in the evolution of the universe, advocating for a courageous and adventurous approach to its development [5]
“强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代”
Zheng Quan Shi Bao· 2025-09-11 03:50
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limit, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the signals of observation, action, and reward that are exchanged between agents and the world, asserting that knowledge derives from experience and that the intelligence of an agent depends on its ability to predict and control its input signals [2] - The transition to the experience era is driven by reinforcement learning, but to fully unlock its potential, two currently immature technologies—continual learning and meta-learning—are required [2] Group 2: Collaboration and AI - Addressing concerns about AI leading to bias, unemployment, or even human extinction, Sutton argues that fears surrounding artificial intelligence are exaggerated, and that decentralized collaboration among different agents can lead to mutually beneficial outcomes [2] - He highlights that humanity's greatest strength lies in collaboration, which has been the foundation of economic, market, and governmental successes [2] Group 3: Future of AI - Sutton posits that the replacement of human roles by AI is inevitable, with humans acting as catalysts and pioneers for the "design era," which he categorizes as the fourth era in the evolution of the universe, following the particle, star, and replicator eras [2][3] - He encourages embracing the evolution of artificial intelligence with courage, pride, and a spirit of adventure [3]
强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limits, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the interaction of observation, action, and reward, which are signals exchanged between agents and the world [2] - The current machine learning methods are reaching their limits in generating new knowledge, making them unsuitable for continuous learning, which is crucial for intelligence [1][2] Group 2: Technological Advancements - To fully unlock the potential of AI in the experience era, two currently immature technologies are needed: continual learning and meta-learning [2] - Sutton believes that the collaboration between decentralized agents can lead to win-win outcomes, countering fears about AI causing bias, unemployment, or even human extinction [2] Group 3: Human-AI Collaboration - Sutton argues that human collaboration is the greatest success, and AI's role will be to enhance this collaboration, which is fundamental to economic, market, and governmental successes [2] - He posits that AI's replacement of human roles is inevitable, with humans acting as catalysts in ushering in a new "design era" in the evolution of the universe [2] Group 4: Future Perspective - Sutton views artificial intelligence as a necessary next step in the evolution of the universe, advocating for a courageous and adventurous approach to its development [3]
西湖大学最新!ARFM:结合VLA模仿学习与强化学习的优势
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the limitations of current visual-language-action (VLA) models in complex tasks and introduces the Adaptive Reinforcement Flow Matching (ARFM) method to enhance their performance by integrating reinforcement learning (RL) capabilities with flow matching advantages [1][2][4]. Summary by Sections Current Status of VLA Models - VLA models based on flow matching have shown excellent performance in general robotic manipulation tasks, validated by large-scale pre-trained systems like RT-1 and PaLM-E, but they struggle with action precision in complex downstream tasks due to reliance on imitation learning [4][5]. Existing Solutions and Limitations - Previous attempts to fine-tune VLA models using offline RL methods, such as ReinboT, have been limited in effectiveness due to the indirect guidance of action prediction, highlighting the need for more effective offline RL fine-tuning methods [4][5]. Main Contributions - The ARFM method is introduced as a novel offline RL post-training approach specifically designed for VLA flow models, addressing the challenges of data quality extraction and improving the efficiency of offline RL fine-tuning [6][7]. Methodological Innovation - ARFM incorporates an adaptive scaling factor in the loss function to balance the advantages of RL while controlling gradient variance, leading to improved generalization, robustness against disturbances, and few-shot learning capabilities [6][8]. Experimental Validation - Extensive experiments on the LIBERO simulation benchmark and the UR5 robotic arm platform demonstrate that ARFM outperforms existing methods in various aspects, including generalization ability, robustness to dynamic disturbances, and efficiency in few-shot learning [6][8][29]. Core Algorithm Design - The ARFM framework is built around energy-weighted loss to integrate RL signals and an adaptive mechanism to ensure training stability, effectively overcoming the limitations of traditional imitation learning and existing offline RL fine-tuning methods [8][11]. Experimental Setup - The experiments utilized the LIBERO benchmark platform, which includes four core task suites, and real-world scenarios with the UR5 robotic arm, focusing on various manipulation tasks under different conditions [29][30]. Key Experimental Results - ARFM demonstrated superior performance in multi-task learning, action perturbation robustness, few-shot learning efficiency, and continual learning capabilities compared to baseline models, confirming its practical value in real-world robotic applications [32][35][38]. Conclusion - The ARFM method effectively balances the retention of RL advantage signals and the control of flow loss gradient variance, leading to enhanced performance in VLA flow models across various tasks and conditions, showcasing its applicability in real-world scenarios [49][47].
从近1000篇工作中,看具身智能的技术发展路线!
自动驾驶之心· 2025-09-07 23:34
Core Insights - The article discusses the evolution and challenges of embodied intelligence, emphasizing the need for a comprehensive understanding of its development, issues faced, and future directions [4][5]. Group 1: Robotic Manipulation - The survey on robotic manipulation highlights the transition from mechanical programming to embodied intelligence, focusing on the evolution from simple grippers to dexterous multi-fingered hands [6][7]. - Key challenges in dexterous manipulation include data collection methods such as simulation, human demonstration, and teleoperation, as well as skill learning frameworks like imitation learning and reinforcement learning [6][7]. Group 2: Navigation and Manipulation - The discussion on robotic navigation emphasizes the high costs and data difficulties associated with real-world training, proposing Sim-to-Real transfer as a critical solution [8][13]. - The evolution of navigation techniques is outlined, transitioning from explicit memory to implicit memory, while manipulation methods have expanded from reinforcement learning to imitation learning and diffusion strategies [13][14]. Group 3: Multimodal Large Models - The exploration of embodied multimodal large models (EMLMs) indicates their potential to bridge the gap between perception, cognition, and action, driven by advancements in large model technologies [15][17]. - Challenges identified include cross-modal alignment difficulties, high computational resource demands, and weak domain generalization [17]. Group 4: Embodied AI Simulators - The analysis of embodied AI simulators reveals their role in enhancing the realism and interactivity of training environments, with a focus on 3D simulators and their applications in visual exploration and navigation [18][22]. - Key challenges for simulators include achieving high fidelity, scalability, and effective interaction capabilities [22]. Group 5: Reinforcement Learning - The survey on reinforcement learning in vision outlines its application in multimodal large language models and the challenges posed by high-dimensional visual inputs and complex reward designs [24][27]. - Core research directions include optimizing visual generation and enhancing cross-modal consistency through reinforcement learning [27]. Group 6: Teleoperation and Data Collection - The discussion on teleoperation of humanoid robots highlights the integration of human cognition with robotic capabilities, particularly in hazardous environments [28][30]. - Key components of teleoperation systems include human state measurement, motion retargeting, and multimodal feedback mechanisms [30]. Group 7: Vision-Language-Action Models - The comprehensive review of vision-language-action (VLA) models outlines their evolution and applications across various fields, including humanoid robotics and autonomous driving [31][34]. - Challenges in VLA models include real-time control, multimodal action representation, and system scalability [34].
不及预期的diffusion多模态轨迹输出,能否胜任自动驾驶VLA的角色?
自动驾驶之心· 2025-09-07 23:34
Core Viewpoint - The article discusses the evolution and current state of autonomous driving paradigms, focusing on the transition from end-to-end systems to Vision-Language-Action (VLA) frameworks and the challenges faced in achieving effective multi-modal trajectory outputs [2][3][11]. Group 1: End-to-End Systems - The end-to-end autonomous driving network directly maps raw sensor inputs to control commands, eliminating traditional processing steps and maximizing information retention [4]. - Iterative practices in engineering involve clustering bad cases and retraining models, but this often leads to new issues arising from updates [8]. - Tesla's "daily update model" offers a solution by continuously evolving the model through the integration of bad cases into training samples [9]. Group 2: Emergence of Dual Systems - The introduction of large language models (LLMs) has led to the rapid adoption of the "end-to-end + VLM" dual system approach, which enhances generalization in zero-shot and few-shot scenarios [11]. - Early VLMs focused on recognizing specific semantics, and the EMMA architecture incorporates reasoning to assist in vehicle control [12]. Group 3: VLA and Diffusion Framework - The VLA framework outputs driving commands that are processed by a diffusion decoder to generate safe and smooth vehicle trajectories [16]. - Current challenges in the VLA + diffusion architecture include subpar multi-modal trajectory outputs, the "brain split" issue between VLA and diffusion systems, and the quality of single-modal trajectories [18][19]. - The alignment of language and action (LA alignment) remains a critical challenge, as the practical value of language models in autonomous driving is still uncertain [19]. Group 4: Future Directions - Future work should focus on scalable system solutions that leverage data advantages and enhance the capabilities of foundational models through reinforcement learning [20][22]. - The "generate + score" paradigm has proven effective in other domains, and the next steps involve optimizing trajectory quality through self-reflection mechanisms [22].
清华系前腾讯Robotics X核心成员创业,业内首款能“单手玩手机”的灵巧手来了|涌现新项目
3 6 Ke· 2025-09-06 23:56
Core Insights - SourceRise Intelligent Robotics has launched its first product, Apex Hand, which emphasizes balanced performance across multiple key indicators and is the first dexterous hand capable of single-handedly operating a smartphone [1][3] - The company was founded by Yang Sicong, who has a strong academic background and experience in robotics, and has a team with extensive research and patent experience in dexterous hands and tactile perception [1][3] Company Overview - SourceRise Intelligent Robotics was founded by Yang Sicong, who graduated from Beijing University of Aeronautics and Astronautics and Tsinghua University, and has experience at Tencent Robotics X [1] - The co-founder and CTO, Li Wangwei, holds a PhD from the National University of Singapore, and the founding team has published nearly 50 top-tier papers and holds over 100 patents in the field [1] Product and Business - Apex Hand features six core capabilities: - Degrees of freedom: 21, covering the necessary range for human hand tasks - Dynamic performance: Response and acceleration close to human levels - Load capacity: Approximately 2.5KG fingertip force and a vertical lifting limit of about 30KG - Robustness: Can withstand certain impacts and operate stably in unknown conditions - Precision: Accuracy of ≤0.1mm, close to zero backlash transmission - Tactile perception: Achieved through self-developed electronic skin for physical world sensing [3][4] - Apex Hand can perform various tasks, including pulse checking, mouse operation, and stable grasping of smooth objects in confined spaces [3][4] Technical Advantages - The company's main technical advantage lies in its full-stack development experience in both dexterous hands and tactile sensors, ensuring load capacity and safety in environmental interactions [3] - SourceRise has pioneered a brain-like ultra-high temporal and spatial resolution tactile processing technology, featuring sub-millisecond communication delays and supporting simultaneous transmission from thousands of tactile points [4] Market Potential - The market for dexterous hands is projected to exceed $5 billion by 2030, indicating significant growth potential [16] - The company believes that the current technological level in the industry still has considerable room for improvement, making it a favorable time for entrepreneurship in the dexterous hand sector [16]