Workflow
强化学习
icon
Search documents
基于深度强化学习的轨迹规划
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].
理想汽车高管解读Q2财报:将通过辅助驾驶的深度焕新强化产品竞争力
Xin Lang Ke Ji· 2025-08-28 14:46
Core Insights - The management of Li Auto discussed strategies to address the decline in sales of the L series and emphasized the importance of enhancing product competitiveness through intelligent driving features [1] - The company is set to upgrade its entire range of extended-range AD Max models with the VLA intelligent driving system, which has shown significant improvements in driving performance [2] - Li Auto's pure electric vehicle lineup is expected to grow with the introduction of the i6 model, targeting younger consumers and aiming for substantial sales contributions [3] Sales Strategy - The company aims to achieve its overall sales targets by focusing on intelligent features and regional marketing strategies tailored to local market conditions [1][3] - The sales system has been adjusted to manage 23 regions directly from headquarters, allowing for localized policy implementation [3] Product Development - The VLA intelligent driving system is undergoing rapid iterations supported by a simulation environment, enhancing the company's competitive edge in autonomous driving technology [2] - The i8 model has received positive feedback, and the company plans to ramp up production to deliver between 8,000 to 10,000 units by the end of September [3] Marketing and Channel Optimization - The marketing strategy emphasizes "regionalization," focusing on different selling points for northern and southern markets [3] - The company is optimizing its store locations and types to improve customer acquisition and conversion rates, particularly in first to third-tier cities [4] - Expansion into lower-tier cities will be facilitated through a lightweight store model, increasing brand visibility and tapping into underdeveloped markets [4][5]
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-28 08:36
Group 1 - The establishment of the Embodied Intelligence Heart Technology Exchange Group focuses on various advanced technologies including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the group entry process, it is advised to include a note with the institution/school, name, and research direction [3]
波士顿动力机器狗侧空翻炸场!穿轮滑鞋照样能翻
量子位· 2025-08-28 06:46
Core Viewpoint - Boston Dynamics' Spot robot has demonstrated advanced capabilities, including performing flips, which serve as a rigorous test for its hardware and algorithms, ensuring reliability in real-world operations [18][20][21]. Group 1: Robot Capabilities - Spot can perform various tasks beyond acrobatics, such as climbing stairs, surveying, and opening doors, showcasing its practical applications [10][12][14][16]. - The ability to perform flips is not just for show; it indicates the robustness of Spot's hardware and software systems [20][21]. Group 2: Training and Development - Spot's training involves reinforcement learning in simulated environments before real-world testing, allowing for iterative improvements in stability and performance [22]. - The robot's design includes 12 degrees of freedom and is equipped with five pairs of stereo cameras, enhancing its operational capabilities [22]. Group 3: Historical Context and Popularity - Spot has been a well-known entity since its introduction in 2016, gaining fame through various performances, including dancing to popular songs [27][30]. - The acquisition of Boston Dynamics by Hyundai in 2020 has positioned the company for further growth and innovation in robotics [31].
具身智能之心B端和C端培训老师招募来啦~
具身智能之心· 2025-08-28 01:20
Group 1 - The article announces the recruitment of teachers for embodied intelligence training, targeting both B-end (business) and C-end (consumer) training services, with compensation above industry standards [1] - The training covers various advanced topics including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, sim2real, multimodal large models, simulation, motion control, and target navigation [2] - B-end training is aimed at enterprises, universities, and research institutions, while C-end training focuses on students and job seekers, with responsibilities including curriculum design and material preparation [3] Group 2 - Candidates are required to have a doctoral degree or higher (including those currently enrolled), with a preference for those who have published two papers in A-level or Q1 journals/conferences, or have two years of industry experience [3] - Interested individuals can add a specified WeChat contact for further inquiries [4]
斯坦福大学提出RTR框架,让机械臂助力人形机器人真机训练
具身智能之心· 2025-08-28 01:20
Core Insights - The article discusses the emerging focus on motion control of humanoid robots as a key application area for reinforcement learning (RL) algorithms, emphasizing the "Sim-to-Real" paradigm and the challenges associated with transferring learned behaviors from simulation to real-world environments [1][2]. Group 1: Current Challenges and Innovations - Current methods primarily utilize domain randomization to train general control models in diverse simulated environments, aiming for zero-shot transfer to real-world dynamics [1][2]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [2]. - The inherent instability of humanoid robots poses significant risks during real-world training, making direct reinforcement learning in these environments a longstanding challenge [2]. Group 2: Proposed Solutions - The article introduces an innovative approach inspired by human learning, where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning [3][5]. - The teacher arm serves multiple roles: providing safety, assisting in resets after failures, collecting training data, and facilitating a structured learning process through curriculum learning [5][7]. Group 3: RTR System Overview - The proposed system, named RTR (Robot-Trains-Robot), highlights the importance of physical assistance from the teacher robot for effective real-world learning [7][9]. - To address the high costs of real-world data collection, a novel RL algorithm is introduced that optimizes a low-dimensional latent variable related to environmental dynamics, significantly enhancing sample efficiency [7][9]. Group 4: Methodology and Experimental Validation - The RTR system comprises hardware and algorithmic components, featuring a UR5 robotic arm as the teacher and a ToddlerBot humanoid as the student [9][10]. - The Sim-to-Real process is divided into three stages: training adaptable policies in simulation, optimizing a general latent variable, and performing online fine-tuning in the real world [10][12]. - Experimental results demonstrate the effectiveness of the RTR system in tasks such as walking and swinging, showing significant improvements in learning efficiency and performance compared to traditional methods [14][18]. Group 5: Future Implications - The RTR framework not only addresses current limitations in humanoid robot training but also introduces a new paradigm of physical assistance that could be applied to larger humanoid robots and other complex robotic systems [16][19]. - The findings suggest that the integration of teacher robots can enhance the learning process, making it more efficient and stable, which is crucial for advancing real-world applications of humanoid robotics [16][17].
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
机器之心· 2025-08-27 08:36
Core Viewpoint - The article discusses the emergence of BGE-Reasoner, an innovative end-to-end solution for Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions. This solution addresses a critical bottleneck in the development of RAG and AI agents, significantly enhancing their performance in complex reasoning tasks [2][3]. Group 1: BGE-Reasoner Overview - BGE-Reasoner achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and demonstrating its effectiveness in reasoning-intensive retrieval tasks [2][12]. - The model represents a significant milestone in the BGE series, providing a new paradigm for tackling industry challenges related to reasoning-intensive retrieval [3]. Group 2: Technical Innovations - A replicable framework consisting of three modular components: Rewriter, Embedder, and Reranker, was proposed to efficiently handle complex queries [3]. - The research team explored the feasibility of synthesizing high-quality, multi-domain reasoning training data using large models, addressing the critical issue of data scarcity in this field [4]. - Reinforcement learning was successfully applied to the Reranker training, enhancing the model's reasoning and generalization capabilities when faced with challenging samples [5]. Group 3: Performance Comparison - BGE-Reasoner outperformed submissions from major institutions such as Ant Group, Baidu, and ByteDance, leading the BRIGHT leaderboard by a margin of 3.6 points [12][14]. - The embedded vector model, BGE-Reasoner-Embed, also demonstrated superior performance compared to other leading baseline models, confirming the effectiveness of the synthesized training data [12][22]. Group 4: System Workflow - The BGE-Reasoner system follows a classic three-module structure: the original query is rewritten, candidates are retrieved using the Embedder, and final results are ranked by the Reranker [19][24]. - The query understanding module utilizes synthesized data to generate reasoning paths, significantly improving the model's query understanding and rewriting capabilities [21]. - The embedded vector model and the Reranker are fine-tuned based on high-quality synthetic training data, enhancing their performance in reasoning-intensive retrieval tasks [22][24]. Group 5: Future Directions - The research team aims to continue advancing vector models and retrieval enhancement technologies, collaborating with more research institutions and industry partners to promote the development of retrieval and artificial intelligence [25].
Meta万引强化学习大佬跑路,用小扎原话作为离别寄语,扎心了
3 6 Ke· 2025-08-27 06:48
Core Viewpoint - The departure of Rishabh Agarwal from Meta has raised concerns about employee retention and morale within the company, especially as he was a key figure in the reinforcement learning domain and had made significant contributions during his tenure [1][3][15]. Group 1: Rishabh Agarwal's Background and Contributions - Rishabh Agarwal has a strong academic and professional background in reinforcement learning, with over 10,000 citations of his work and an h-index of 34 [5][6]. - He was involved in the development of significant models such as Gemini 1.5 and Gemma 2 during his time at Google and later at Meta [3][11]. - His paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" won the NeurIPS Outstanding Paper Award in 2021, highlighting his expertise in the field [11][13]. Group 2: Implications of His Departure - Agarwal's exit is seen as part of a broader trend of experienced employees leaving Meta, which may be linked to internal conflicts over compensation disparities between new hires and long-term staff [15][17]. - The departure of Agarwal and other senior employees could impact Meta's research capabilities and innovation in artificial intelligence [1][15]. - There are speculations that Agarwal may pursue entrepreneurial ventures, indicating a potential shift in the competitive landscape of AI research [14]. Group 3: Company Culture and Employee Morale - The recruitment drive at Meta has reportedly created friction among employees, leading to threats of resignation from some researchers [17]. - The situation reflects a challenging environment for Meta as it attempts to balance attracting new talent while retaining its existing workforce [17].
打磨7年,李航新书《机器学习方法(第2版)》发布,有了强化学习,赠书20本
机器之心· 2025-08-27 03:18
Core Viewpoint - The article discusses the release of the second edition of "Machine Learning Methods" by Li Hang, which expands on traditional machine learning to include deep learning and reinforcement learning, addressing the growing interest in these areas within the AI community [4][5][22]. Summary by Sections Overview of the Book - The new edition of "Machine Learning Methods" includes significant updates and additions, particularly in reinforcement learning, which has been gaining attention in AI applications [4][5]. - The book is structured into four main parts: supervised learning, unsupervised learning, deep learning, and reinforcement learning, providing a comprehensive framework for readers [5][22]. Supervised Learning - The first part covers key supervised learning methods such as linear regression, perceptron, support vector machines, maximum entropy models, logistic regression, boosting methods, hidden Markov models, and conditional random fields [7]. Unsupervised Learning - The second part focuses on unsupervised learning techniques, including clustering, singular value decomposition, principal component analysis, Markov chain Monte Carlo methods, EM algorithm, latent semantic analysis, and latent Dirichlet allocation [8]. Deep Learning - The third part introduces major deep learning methods, such as feedforward neural networks, convolutional neural networks, recurrent neural networks, Transformers, diffusion models, and generative adversarial networks [9]. Reinforcement Learning - The fourth part details reinforcement learning methods, including Markov decision processes, multi-armed bandit problems, proximal policy optimization, and deep Q networks [10]. - The book aims to provide a systematic introduction to reinforcement learning, which has been less covered in previous textbooks [4][10]. Learning Approach - Each chapter presents one or two machine learning methods, explaining models, strategies, and algorithms in a clear manner, supported by mathematical derivations to enhance understanding [12][19]. - The book is designed for university students and professionals, assuming a background in calculus, linear algebra, probability statistics, and computer science [22]. Author Background - Li Hang, the author, is a recognized expert in the field, with a background in natural language processing, information retrieval, machine learning, and data mining [24].
手把手教机器人:斯坦福大学提出RTR框架,让机械臂助力人形机器人真机训练
机器之心· 2025-08-27 00:46
Core Viewpoint - The application of reinforcement learning (RL) algorithms in humanoid robot motion control is emerging as a key research area, with a focus on the "Sim-to-Real" paradigm, which aims to train general control models in diverse simulated environments to adapt to the real world [2][3]. Group 1: Current Challenges and Innovations - Existing methods primarily utilize domain randomization to train models in simulation, achieving impressive results in various tasks but often sacrificing performance in specific real-world environments [2][3]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [3]. - The challenge of conducting RL training in real environments has been a significant barrier due to the instability of humanoid robots, which can lead to hardware damage from minor errors [3]. Group 2: Proposed Solution - RTR System - The RTR (Robot-Trains-Robot) system introduces a novel approach where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning, inspired by how human parents teach infants to walk [4][6]. - The teacher arm plays multiple roles: it provides safety support, assists in resetting the student after failures, collects valuable training data, and sets a curriculum to enhance learning efficiency [5][6]. Group 3: Hardware and Algorithm Design - The RTR system consists of a hardware setup with a teacher and student robot, where the teacher is a UR5 robotic arm equipped with force-torque sensors, and the student is based on the open-source ToddlerBot [8][9]. - The system's algorithm involves a three-stage Sim-to-Real process: training adaptable strategies in simulation, optimizing a general initial latent variable, and performing online fine-tuning in the real world with minimal data [9][11]. Group 4: Experimental Validation - Experiments demonstrated the effectiveness of the RTR system in tasks like walking and swinging, showing that the teacher's flexible assistance significantly improves learning outcomes compared to fixed supports [15][19]. - The proposed fine-tuning method using latent variables outperformed traditional methods in data efficiency and final performance, achieving a twofold speed increase in walking strategies with just 20 minutes of real-world training [15][18]. Group 5: Future Prospects - The RTR framework not only addresses the current challenges in deploying humanoid robots but also introduces a new paradigm of physical assistance for real-world learning, with potential applications in larger humanoid robots and other complex robotic systems [17].