Workflow
强化学习
icon
Search documents
图灵奖得主理查德·萨顿2025外滩大会演讲:经验是一切智能的核心与基础
Yang Guang Wang· 2025-09-11 04:06
Core Insights - The 2025 Inclusion Bund Conference opened in Shanghai, featuring a keynote speech by Richard Sutton, the 2024 Turing Award winner and a pioneer in reinforcement learning [1] Group 1: Machine Learning and AI - Sutton emphasized that current machine learning primarily focuses on transferring existing human knowledge to static, non-autonomous AI, reaching the limits of human data [2] - He introduced the concept of the "experience era," advocating for new data sources generated through direct interaction between intelligent agents and the world [2] - Sutton defined "experience" as the interplay of observation, action, and reward, asserting that knowledge is derived from experience, which is fundamental to intelligence [2] Group 2: Future of AI - Sutton proposed four predictive principles regarding the future of AI: 1. There is no consensus on how the world operates, and no single perspective can dominate [3] 2. Humanity will truly understand intelligence and create it through technology [3] 3. Current human intelligence will soon be surpassed by superintelligent AI or enhanced humans [3] 4. Power and resources will gravitate towards the most intelligent agents [3] - He categorized the history of the universe into four eras: particle, star, replicator, and design, asserting that humanity's unique ability to push design to its limits is crucial in the pursuit of AI [3] Group 3: Embracing AI - Sutton stated that artificial intelligence is the inevitable next step in the evolution of the universe, and it should be embraced with courage, pride, and a spirit of adventure [4]
AI跨步进入“经验时代”
Hua Er Jie Jian Wen· 2025-09-11 03:50
Group 1 - The AI industry is transitioning into an "experience era," where continuous learning is essential for intelligence, moving beyond the limitations of human data [2] - Richard Sutton emphasizes that knowledge is derived from experience, which involves observation, action, and reward, and that the intelligence of an agent depends on its ability to predict and control input signals [2] - Two technologies, continual learning and meta-learning, are necessary to unlock the full potential of AI in this new experience era [2] Group 2 - Concerns about AI leading to bias, unemployment, or even human extinction are exaggerated and fueled by certain organizations and individuals profiting from such fears [3] - Sutton argues that decentralized collaboration among agents with different goals can lead to mutual benefits, highlighting human cooperation as a unique strength [3] - He presents four predictive principles regarding the future of AI, including the lack of consensus on how the world should operate and the potential for superintelligent AI to surpass human intelligence [3] Group 3 - Sutton categorizes the history of the universe into four eras: particle, star, replicator, and design, asserting that humanity's unique ability to push design to its limits is crucial in the current pursuit of AI [4] - He believes that AI is an inevitable next step in the evolution of the universe, advocating for a courageous and adventurous approach to its development [5]
“强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代”
Zheng Quan Shi Bao· 2025-09-11 03:50
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limit, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the signals of observation, action, and reward that are exchanged between agents and the world, asserting that knowledge derives from experience and that the intelligence of an agent depends on its ability to predict and control its input signals [2] - The transition to the experience era is driven by reinforcement learning, but to fully unlock its potential, two currently immature technologies—continual learning and meta-learning—are required [2] Group 2: Collaboration and AI - Addressing concerns about AI leading to bias, unemployment, or even human extinction, Sutton argues that fears surrounding artificial intelligence are exaggerated, and that decentralized collaboration among different agents can lead to mutually beneficial outcomes [2] - He highlights that humanity's greatest strength lies in collaboration, which has been the foundation of economic, market, and governmental successes [2] Group 3: Future of AI - Sutton posits that the replacement of human roles by AI is inevitable, with humans acting as catalysts and pioneers for the "design era," which he categorizes as the fourth era in the evolution of the universe, following the particle, star, and replicator eras [2][3] - He encourages embracing the evolution of artificial intelligence with courage, pride, and a spirit of adventure [3]
强化学习之父” 理查德·萨顿:人类数据红利逼近极限,AI正进入以持续学习为核心的“经验时代
Core Insights - Richard Sutton, the 2024 Turing Award winner, emphasizes that the human data dividend is nearing its limits, and artificial intelligence is entering an "experience era" centered on continuous learning, which has the potential to exceed previous capabilities [1][2] Group 1: Experience Era - Sutton defines "experience" as the interaction of observation, action, and reward, which are signals exchanged between agents and the world [2] - The current machine learning methods are reaching their limits in generating new knowledge, making them unsuitable for continuous learning, which is crucial for intelligence [1][2] Group 2: Technological Advancements - To fully unlock the potential of AI in the experience era, two currently immature technologies are needed: continual learning and meta-learning [2] - Sutton believes that the collaboration between decentralized agents can lead to win-win outcomes, countering fears about AI causing bias, unemployment, or even human extinction [2] Group 3: Human-AI Collaboration - Sutton argues that human collaboration is the greatest success, and AI's role will be to enhance this collaboration, which is fundamental to economic, market, and governmental successes [2] - He posits that AI's replacement of human roles is inevitable, with humans acting as catalysts in ushering in a new "design era" in the evolution of the universe [2] Group 4: Future Perspective - Sutton views artificial intelligence as a necessary next step in the evolution of the universe, advocating for a courageous and adventurous approach to its development [3]
西湖大学最新!ARFM:结合VLA模仿学习与强化学习的优势
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the limitations of current visual-language-action (VLA) models in complex tasks and introduces the Adaptive Reinforcement Flow Matching (ARFM) method to enhance their performance by integrating reinforcement learning (RL) capabilities with flow matching advantages [1][2][4]. Summary by Sections Current Status of VLA Models - VLA models based on flow matching have shown excellent performance in general robotic manipulation tasks, validated by large-scale pre-trained systems like RT-1 and PaLM-E, but they struggle with action precision in complex downstream tasks due to reliance on imitation learning [4][5]. Existing Solutions and Limitations - Previous attempts to fine-tune VLA models using offline RL methods, such as ReinboT, have been limited in effectiveness due to the indirect guidance of action prediction, highlighting the need for more effective offline RL fine-tuning methods [4][5]. Main Contributions - The ARFM method is introduced as a novel offline RL post-training approach specifically designed for VLA flow models, addressing the challenges of data quality extraction and improving the efficiency of offline RL fine-tuning [6][7]. Methodological Innovation - ARFM incorporates an adaptive scaling factor in the loss function to balance the advantages of RL while controlling gradient variance, leading to improved generalization, robustness against disturbances, and few-shot learning capabilities [6][8]. Experimental Validation - Extensive experiments on the LIBERO simulation benchmark and the UR5 robotic arm platform demonstrate that ARFM outperforms existing methods in various aspects, including generalization ability, robustness to dynamic disturbances, and efficiency in few-shot learning [6][8][29]. Core Algorithm Design - The ARFM framework is built around energy-weighted loss to integrate RL signals and an adaptive mechanism to ensure training stability, effectively overcoming the limitations of traditional imitation learning and existing offline RL fine-tuning methods [8][11]. Experimental Setup - The experiments utilized the LIBERO benchmark platform, which includes four core task suites, and real-world scenarios with the UR5 robotic arm, focusing on various manipulation tasks under different conditions [29][30]. Key Experimental Results - ARFM demonstrated superior performance in multi-task learning, action perturbation robustness, few-shot learning efficiency, and continual learning capabilities compared to baseline models, confirming its practical value in real-world robotic applications [32][35][38]. Conclusion - The ARFM method effectively balances the retention of RL advantage signals and the control of flow loss gradient variance, leading to enhanced performance in VLA flow models across various tasks and conditions, showcasing its applicability in real-world scenarios [49][47].
从近1000篇工作中,看具身智能的技术发展路线!
自动驾驶之心· 2025-09-07 23:34
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 每次再聊具身智能时,总能看到很多paper一直说 "突破、创新"。但很少完整地把整个技术路线串起来,让大家清晰地 知道具身是怎么发展的?碰到了哪些问题?未来的走向是怎么样的? 机器人操作如何让机械臂精准 "模仿" 人类?多模态融合怎样让智能体 "身临其境"?强化学习如何驱动系统自主进化? 遥操作与数据采集又怎样打破空间限制?这些具身智能的关键内容需要我们认真梳理下。 今天我们将会为大家带来领域里比较丰富的几篇研究综述,带你拆解这些方向的发展逻辑。 机器人操作相关 参考论文: The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey 论文链接:https://arxiv.org/abs/2507.11840 作者单位: 浙 ...
不及预期的diffusion多模态轨迹输出,能否胜任自动驾驶VLA的角色?
自动驾驶之心· 2025-09-07 23:34
Core Viewpoint - The article discusses the evolution and current state of autonomous driving paradigms, focusing on the transition from end-to-end systems to Vision-Language-Action (VLA) frameworks and the challenges faced in achieving effective multi-modal trajectory outputs [2][3][11]. Group 1: End-to-End Systems - The end-to-end autonomous driving network directly maps raw sensor inputs to control commands, eliminating traditional processing steps and maximizing information retention [4]. - Iterative practices in engineering involve clustering bad cases and retraining models, but this often leads to new issues arising from updates [8]. - Tesla's "daily update model" offers a solution by continuously evolving the model through the integration of bad cases into training samples [9]. Group 2: Emergence of Dual Systems - The introduction of large language models (LLMs) has led to the rapid adoption of the "end-to-end + VLM" dual system approach, which enhances generalization in zero-shot and few-shot scenarios [11]. - Early VLMs focused on recognizing specific semantics, and the EMMA architecture incorporates reasoning to assist in vehicle control [12]. Group 3: VLA and Diffusion Framework - The VLA framework outputs driving commands that are processed by a diffusion decoder to generate safe and smooth vehicle trajectories [16]. - Current challenges in the VLA + diffusion architecture include subpar multi-modal trajectory outputs, the "brain split" issue between VLA and diffusion systems, and the quality of single-modal trajectories [18][19]. - The alignment of language and action (LA alignment) remains a critical challenge, as the practical value of language models in autonomous driving is still uncertain [19]. Group 4: Future Directions - Future work should focus on scalable system solutions that leverage data advantages and enhance the capabilities of foundational models through reinforcement learning [20][22]. - The "generate + score" paradigm has proven effective in other domains, and the next steps involve optimizing trajectory quality through self-reflection mechanisms [22].
清华系前腾讯Robotics X核心成员创业,业内首款能“单手玩手机”的灵巧手来了|涌现新项目
3 6 Ke· 2025-09-06 23:56
Core Insights - SourceRise Intelligent Robotics has launched its first product, Apex Hand, which emphasizes balanced performance across multiple key indicators and is the first dexterous hand capable of single-handedly operating a smartphone [1][3] - The company was founded by Yang Sicong, who has a strong academic background and experience in robotics, and has a team with extensive research and patent experience in dexterous hands and tactile perception [1][3] Company Overview - SourceRise Intelligent Robotics was founded by Yang Sicong, who graduated from Beijing University of Aeronautics and Astronautics and Tsinghua University, and has experience at Tencent Robotics X [1] - The co-founder and CTO, Li Wangwei, holds a PhD from the National University of Singapore, and the founding team has published nearly 50 top-tier papers and holds over 100 patents in the field [1] Product and Business - Apex Hand features six core capabilities: - Degrees of freedom: 21, covering the necessary range for human hand tasks - Dynamic performance: Response and acceleration close to human levels - Load capacity: Approximately 2.5KG fingertip force and a vertical lifting limit of about 30KG - Robustness: Can withstand certain impacts and operate stably in unknown conditions - Precision: Accuracy of ≤0.1mm, close to zero backlash transmission - Tactile perception: Achieved through self-developed electronic skin for physical world sensing [3][4] - Apex Hand can perform various tasks, including pulse checking, mouse operation, and stable grasping of smooth objects in confined spaces [3][4] Technical Advantages - The company's main technical advantage lies in its full-stack development experience in both dexterous hands and tactile sensors, ensuring load capacity and safety in environmental interactions [3] - SourceRise has pioneered a brain-like ultra-high temporal and spatial resolution tactile processing technology, featuring sub-millisecond communication delays and supporting simultaneous transmission from thousands of tactile points [4] Market Potential - The market for dexterous hands is projected to exceed $5 billion by 2030, indicating significant growth potential [16] - The company believes that the current technological level in the industry still has considerable room for improvement, making it a favorable time for entrepreneurship in the dexterous hand sector [16]
《Science Robotics》重磅:仅需2小时,机器人柔性装配技能直逼人类顶尖水平
机器人大讲堂· 2025-09-06 11:43
Core Insights - The article discusses the challenges in robotic manipulation and introduces a new system called HIL-SERL that significantly improves the efficiency and effectiveness of robotic training in real-world scenarios [1][2]. Traditional Methods Challenges - Traditional robotic control methods require extensive engineering design or imitation learning, which often lack adaptability and struggle in new environments [1]. - These methods fail to achieve human-level proficiency and speed, leading to inefficiencies in real-world applications [1]. HIL-SERL System Overview - The HIL-SERL system developed by a research team at UC Berkeley allows robots to learn complex tasks with only 1 to 2.5 hours of real-world training, achieving near-perfect success rates and surpassing human execution speeds [2][3]. - The system combines human guidance with autonomous exploration, creating an efficient and safe learning loop [3]. System Architecture - HIL-SERL consists of three core components: an executor process, a learner process, and a replay buffer integrated within the learner [4]. - It employs off-policy reinforcement learning techniques to optimize behavior strategies by leveraging historical data, allowing robots to learn from human demonstrations and assess the contribution of different actions towards achieving goals [4]. Performance in Multi-Task Scenarios - The system was tested on challenging tasks such as precision assembly, dual-arm coordination, and dynamic manipulation, demonstrating its versatility [5][8]. - In precision assembly tasks, robots achieved sub-millimeter accuracy, while in dual-arm coordination tasks, they effectively managed complex operations requiring synchronized movements [8]. Results and Adaptability - After 1 to 2.5 hours of training, robots achieved nearly 100% success rates and executed tasks 1.8 times faster than traditional imitation learning methods, which had an average success rate of 49.7% [9]. - The robots exhibited remarkable adaptability, successfully adjusting to unexpected situations, such as misalignments or disturbances, showcasing their ability to learn from real-time feedback [12]. Learning Mechanism - HIL-SERL's adaptability stems from its ability to evolve different control strategies based on task requirements, allowing for real-time adjustments and corrections [13][16]. - For high-precision tasks, the system employs a closed-loop response strategy, while for dynamic tasks, it utilizes an open-loop predictive strategy, demonstrating a high level of confidence in executing planned actions [13]. Conclusion - The research highlights the potential of HIL-SERL to overcome traditional reinforcement learning limitations, enabling efficient learning of complex skills in real-world environments [14]. - This advancement opens new avenues for industrial applications, particularly in flexible manufacturing sectors requiring small-batch production [14].
想要「版本」超车,Agent 需要怎样的「Environment」?
机器之心· 2025-09-06 07:00
Core Viewpoint - The article discusses the recent transformation of AI startup you.com from a search engine to an AI infrastructure company following a $100 million Series C funding round. This shift aligns with the "product-driven infrastructure" strategy and reflects a broader trend of commercializing Agentic AI from laboratory settings [1]. Group 1: Agent Environment and Its Evolution - The focus of artificial intelligence is shifting from content creation to goal-driven, autonomous Agentic AI, driven by rapid advancements in the field [4]. - AI agents are expected to become the new interface for human-computer interaction, allowing users to issue commands in natural language without needing to write code [5]. - Companies like Cursor, Bolt, and Mercor have achieved significant revenue growth by leveraging unique intelligent agent products [6]. Group 2: Development of Agent Environment - The development of a suitable "Agent Environment" is crucial for modern intelligent applications, balancing the need for freedom in code execution with security and isolation [7]. - Companies like E2B and Modal Labs are providing secure, isolated cloud environments (sandboxes) for running AI-generated code [7]. - The concept of Agent Environment can be traced back to reinforcement learning, where it serves as a simulated space for training agents through trial and error [8]. Group 3: Real-World Application and Safety - As LLM-based agents advance, the requirements for their environments are evolving from training spaces to operational zones, necessitating safe access to real-world tools [9]. - Different types of agents require distinct environments, such as physical environments for robots and digital environments for virtual assistants [10].