Workflow
强化学习
icon
Search documents
端到端VLA的起点:聊聊大语言模型和CLIP~
自动驾驶之心· 2025-08-19 07:20
Core Viewpoint - The article discusses the development and significance of end-to-end (E2E) algorithms in autonomous driving, emphasizing the integration of various advanced technologies such as large language models (LLMs), diffusion models, and reinforcement learning (RL) in enhancing the capabilities of autonomous systems [21][31]. Summary by Sections Section 1: Overview of End-to-End Autonomous Driving - The first chapter provides a comprehensive overview of the evolution of end-to-end algorithms, explaining the transition from modular approaches to end-to-end solutions, and discussing the advantages and challenges of different paradigms [40]. Section 2: Background Knowledge - The second chapter focuses on the technical stack associated with end-to-end systems, detailing the importance of LLMs, diffusion models, and reinforcement learning, which are crucial for understanding the future job market in this field [41][42]. Section 3: Two-Stage End-to-End Systems - The third chapter delves into two-stage end-to-end systems, exploring their emergence, advantages, and disadvantages, while also reviewing notable works in the field such as PLUTO and CarPlanner [42][43]. Section 4: One-Stage End-to-End and VLA - The fourth chapter highlights one-stage end-to-end systems, discussing various subfields including perception-based methods and the latest advancements in VLA (Vision-Language Alignment), which are pivotal for achieving the ultimate goals of autonomous driving [44][50]. Section 5: Practical Application and RLHF Fine-Tuning - The fifth chapter includes a major project focused on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing practical insights into building pre-training and reinforcement learning modules, which are applicable to VLA-related algorithms [52]. Course Structure and Learning Outcomes - The course aims to equip participants with a solid understanding of end-to-end autonomous driving technologies, covering essential frameworks and methodologies, and preparing them for roles in the industry [56][57].
从方法范式和应用场景上看强化与VLA/Flow Matching/机器人控制算法
具身智能之心· 2025-08-19 01:54
Core Viewpoint - The article discusses recent advancements in reinforcement learning (RL) and its applications in robotics, particularly focusing on the VLA (Vision-Language Action) models and diffusion policies, highlighting their potential to handle complex tasks that traditional RL struggles with [2][4][35]. Method Paradigms - Traditional RL and imitation learning combined with Sim2Real techniques are foundational approaches in robotics [3]. - VLA models differ fundamentally from traditional RL by using training data distributions to describe task processes and goals, allowing for the execution of more complex tasks [4][35]. - Diffusion Policy is a novel approach that utilizes diffusion models to generate continuous action sequences, demonstrating superior capabilities in complex task execution compared to traditional RL methods [4][5]. Application Scenarios - The article categorizes applications into two main types: basic motion control for humanoid and quadruped robots, and complex/long-range operational tasks [22][23]. - Basic motion control primarily relies on RL and Sim2Real, with current implementations still facing challenges in achieving fluid motion akin to human or animal movements [22]. - For complex tasks, architectures typically involve a pre-trained Vision Transformer (ViT) encoder and a large language model (LLM), utilizing diffusion or flow matching for action output [23][25]. Challenges and Future Directions - The article identifies key challenges in the field, including the need for better simulation environments, effective domain randomization, and the integration of external goal conditions [35]. - It emphasizes the importance of human intention in task definition and the limitations of current models in learning complex tasks without extensive human demonstration data [35][40]. - Future advancements may involve multi-modal input predictions for task goals and the potential integration of brain-machine interfaces to enhance human-robot interaction [35].
4o-mini华人领队也离职了,这次不怪小扎
量子位· 2025-08-19 01:17
Core Viewpoint - OpenAI's former key researcher Kevin Lu has left to join Thinking Machine Lab, a new AI startup co-founded by former OpenAI CTO Mira Murati, which has reached a valuation of $12 billion [3][19]. Group 1: Kevin Lu's Background and Contributions - Kevin Lu has a strong background in reinforcement learning and small model development, having previously worked at Hudson River Trading, Meta, and OpenAI [5][6]. - At OpenAI, he led the development of the 4o-mini model, which is a multimodal reasoning small model that supports text and image input, designed for complex tasks with improved speed and lower costs [7][9]. - His most cited paper, "Decision Transformer: Reinforcement Learning via Sequence Modeling," has been cited 2,254 times and presents a framework for treating reinforcement learning as conditional sequence modeling [10][11]. Group 2: Thinking Machine Lab - Thinking Machine Lab has attracted several former core researchers from OpenAI, including John Schulman and Barrett Zoph, and has recently completed a record-breaking $2 billion seed funding round [4][17]. - The startup has not yet publicly disclosed any results, which has generated significant anticipation within the AI community [21]. - Despite competitive offers from other tech giants, the team members at Thinking Machine Lab have chosen to remain, indicating strong confidence in the startup's potential [20].
诺奖得主谈「AGI试金石」:AI自创游戏并相互教学
3 6 Ke· 2025-08-19 00:00
Core Insights - The interview with Demis Hassabis, CEO of Google DeepMind, discusses the evolution of AI technology and its future trends, particularly focusing on the development of general artificial intelligence (AGI) and the significance of world models like Genie 3 [2][3]. Group 1: Genie 3 and World Models - Genie 3 is a product of multiple research branches at DeepMind, aimed at creating a "world model" that helps AI understand the physical world, including physical structures, material properties, fluid dynamics, and biological behaviors [3]. - The development of AI has transitioned from specialized intelligence to more comprehensive models, with a focus on understanding the physical world as a foundation for AGI [3][4]. - Genie 3 can generate consistent virtual environments, maintaining the state of the scene when users return, which demonstrates its understanding of the world's operational logic [4]. Group 2: Game Arena and AGI Evaluation - Google DeepMind has partnered with Kaggle to launch Game Arena, a new testing platform designed to evaluate the progress of AGI by allowing models to play various games and test their capabilities [6]. - Game Arena provides a pure testing environment with objective performance metrics, allowing for automatic adjustment of game difficulty as AI capabilities improve [9]. - The platform aims to create a comprehensive assessment of AI's general capabilities across multiple domains, ultimately enabling AI systems to invent and teach new games to each other [9][10]. Group 3: Challenges in AGI Development - Current AI systems exhibit inconsistent performance, being capable in some areas while failing in simpler tasks, which poses a significant barrier to AGI development [7]. - There is a need for more challenging and diverse benchmarks that encompass understanding of the physical world, intuitive physics, and safety features [8]. - Demis emphasizes the importance of understanding human goals and translating them into useful reward functions for optimization in AGI systems [10]. Group 4: Future Directions in AI - The evolution of thinking models, such as Deep Think, represents a crucial direction for AI, focusing on reasoning, planning, and optimization through iterative processes [12]. - The transition from weight models to complete systems is highlighted, where modern AI can integrate tool usage, planning, and reasoning capabilities for more complex functionalities [13].
李建忠:关于AI时代人机交互和智能体生态的研究和思考
AI科技大本营· 2025-08-18 09:50
Core Insights - The article discusses the transformative impact of large models on the AI industry, emphasizing the shift from isolated applications to a more integrated human-machine interaction model, termed "accompanying interaction" [1][5][60]. Group 1: Paradigm Shifts in AI - The transition from training models to reasoning models has significantly enhanced AI's capabilities, particularly through reinforcement learning, which allows AI to generate synthetic data and innovate beyond human knowledge [9][11][13]. - The introduction of "Agentic Models" signifies a shift where AI evolves from merely providing suggestions to actively performing tasks for users [16][18]. Group 2: Application Development Transformation - "Vibe Coding" has emerged as a new programming paradigm, enabling non-professionals to create software using natural language, which contrasts with traditional programming methods [19][22]. - The concept of "Malleable Software" is introduced, suggesting that future software will allow users to customize and personalize applications extensively, leading to a more democratized software development landscape [24][26]. Group 3: Human-Machine Interaction Evolution - The future of human-machine interaction is predicted to be dominated by natural language interfaces, moving away from traditional graphical user interfaces (GUIs) [36][41]. - The article posits that the interaction paradigm will evolve to allow AI agents to seamlessly integrate various services, eliminating the need for users to switch between isolated applications [45][48]. Group 4: Intelligent Agent Ecosystem - The development of intelligent agents is characterized by enhanced capabilities in planning, tool usage, collaboration, memory, and action, which collectively redefine the internet from an "information network" to an "action network" [66][68]. - The introduction of protocols like MCP (Model Context Protocol) and A2A (Agent to Agent) facilitates improved interaction between agents and traditional software, enhancing the overall ecosystem [70].
智驾或超过人驾,别克高端新能源至境L7首搭Momenta R6飞轮大模型
Feng Huang Wang· 2025-08-18 08:34
据介绍,基于强化学习的Momenta R6飞轮大模型,可以在模拟的环境里去探索新的驾驶行为,系统从 自己的成功和失败中吸取经验,自我快速的成长,可以让驾驶在安全、安心的能力上,有机会超过人甚 至大幅度超过人。 凤凰网科技讯 8月18日,上汽通用汽车与Momenta签署战略合作协议,双方将在辅助驾驶领域展开深度 合作。别克高端新能源子品牌"至境"旗下的首款智能豪华轿车——别克至境L7,将搭载基于强化学习的 Momenta R6飞轮大模型。 ...
VLA/强化学习/VLN方向的论文辅导招募!
具身智能之心· 2025-08-18 06:00
Core Viewpoint - The article announces the availability of one-on-one guidance for papers related to embodied intelligence, specifically in the areas of vla, reinforcement learning, and sim2real, targeting conferences such as CVPR, ICCV, ECCV, ICLR, CoRL, ICML, and ICRA [1]. Group 1 - The guidance is aimed at students interested in submitting to major conferences in the field of embodied intelligence [1]. - There are currently three available slots for the guidance sessions [1]. - The mentors are actively engaged in the academic field of embodied intelligence and have innovative ideas [1]. Group 2 - Interested individuals can inquire further by adding a specific WeChat contact or by scanning a QR code for consultation [2].
VLA+RL还是纯强化?从200多篇工作中看强化学习的发展路线
具身智能之心· 2025-08-18 00:07
Core Insights - The article provides a comprehensive analysis of the intersection of reinforcement learning (RL) and visual intelligence, focusing on the evolution of strategies and key research themes in visual reinforcement learning [5][17][25]. Group 1: Key Themes in Visual Reinforcement Learning - The article categorizes over 200 representative studies into four main pillars: multimodal large language models, visual generation, unified model frameworks, and visual-language-action models [5][17]. - Each pillar is examined for algorithm design, reward engineering, and benchmark progress, highlighting trends and open challenges in the field [5][17][25]. Group 2: Reinforcement Learning Techniques - Various reinforcement learning techniques are discussed, including Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), which are used to enhance stability and efficiency in training [15][16]. - The article emphasizes the importance of reward models, such as those based on human feedback and verifiable rewards, in guiding the training of visual reinforcement learning agents [10][12][21]. Group 3: Applications in Visual and Video Reasoning - The article outlines applications of reinforcement learning in visual reasoning tasks, including 2D and 3D perception, image reasoning, and video reasoning, showcasing how these methods improve task performance [18][19][20]. - Specific studies are highlighted that utilize reinforcement learning to enhance capabilities in complex visual tasks, such as object detection and spatial reasoning [18][19][20]. Group 4: Evaluation Metrics and Benchmarks - The article discusses the need for new evaluation metrics tailored to large model visual reinforcement learning, combining traditional metrics with preference-based assessments [31][35]. - It provides an overview of various benchmarks that support training and evaluation in the visual domain, emphasizing the role of human preference data in shaping reward models [40][41]. Group 5: Future Directions and Challenges - The article identifies key challenges in visual reinforcement learning, such as balancing depth and efficiency in reasoning processes, and suggests future research directions to address these issues [43][44]. - It highlights the importance of developing adaptive strategies and hierarchical reinforcement learning approaches to improve the performance of visual-language-action agents [43][44].
首届机器人“奥运会”结束:宇树狂揽径赛金牌,障碍赛75%队伍未完赛
第一财经· 2025-08-17 14:58
Core Viewpoint - The first World Humanoid Robot Conference showcased advancements in humanoid robotics, highlighting both achievements and challenges within the industry [3][11]. Group 1: Competition Results - Yuzhu won gold medals in multiple events, including the 1500m and 100m races, demonstrating significant performance capabilities [3][5]. - The Tian工Ultra robot, utilizing autonomous navigation, secured the gold in the 100m race, aiming to change perceptions of robots as mere toys [3][5]. - The MagicBot Z1 improved its average speed by 1 meter per second through enhanced reinforcement learning techniques, showcasing the potential for rapid advancements in robot performance [5]. Group 2: Challenges in the Industry - The 100m obstacle race revealed a 75% failure rate among competitors, indicating significant challenges in algorithm robustness and motion coordination within the humanoid robotics sector [6][8]. - Many robots struggled with environmental adaptability, as evidenced by a robot's inability to pick up different brands of bottles, highlighting limitations in perception and generalization [11]. Group 3: Autonomous Functionality - In material handling and hotel cleaning scenarios, only a few teams achieved full autonomy, with most relying on traditional programming methods [10][11]. - The competition underscored the need for breakthroughs in algorithms and adaptive learning for robots to transition from demonstration-level capabilities to practical applications [11].
松延动力小顽童队立定跳远夺冠,姜哲源:优化了机器人跳远算法
Bei Ke Cai Jing· 2025-08-17 06:41
Group 1 - The 2025 World Humanoid Robot Sports Competition announced the results, with Songyan Power's "Little Rascal" team winning the long jump event with a score of 1.25 meters, followed by Yushu Technology at 1.20 meters and Lingyi Technology at 1.13 meters [1] - Songyan Power's founder and chairman, Jiang Zheyuan, explained that the company prepared multiple strategies and sent two teams to compete, utilizing different robots (N2 and K1) and algorithms to enhance performance [4] - The technical challenges in robot long jump include hardware requirements for explosive power and algorithm optimization, with the company employing reinforcement learning to fine-tune the robot's performance [4] Group 2 - Songyan Power's robots do not have a height advantage, but the company plans to release a full-sized humanoid robot product by the end of this year [4]