Workflow
自动驾驶之心
icon
Search documents
为什么不推荐研究生搞强化学习研究?
自动驾驶之心· 2025-07-21 11:18
原文链接: https://www.zhihu.com/question/1900927726795334198 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 >> 点击进入→ 大模型没那么大Tech技术交流群 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨 询 写在前面 我已经很久没答学术上的问题了,因为最近审的申请书一半都是强化学习相关的?所以知乎老给我推强化 学习的各种东西……我就来简单的谈一谈强化学习吧。 强化学习如果说你要是 读到硕士研究生为止 ,哪怕你读的是清华北大的,最重要的基本功就是 调包 ,搞 清楚什么时候该调什么包就可以了,其次就是怎么排列组合,怎么缩小解空间,对一些算法只需要有个基 本的流程性了解就好了。 如果你读的是 博士 ,建议 换个方向 ,我觉得在现在的强化学习上雕花就是浪费时间和生命,当然你要是 以发很多papers,混个教职当然可以,就是你可能很久都做不出真正很好的工作来,混口饭吃也不注重这 个。 我对强化学习的感受就是 古老且原始 ,感觉就好像现在我还拿着一 ...
SceneDiffuser++:基于生成世界模型的城市规模交通仿真(CVPR'25)
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - The article discusses the development of SceneDiffuser++, a generative world model that enables city-scale traffic simulation, addressing the unique challenges of trip-level simulation compared to event-level simulation [1][2]. Group 1: Introduction and Background - The primary goal of traffic simulation is to supplement limited real-world driving data with extensive synthetic simulation mileage to support the testing and validation of autonomous driving systems [1]. - An ideal generative simulation city (CitySim) should seamlessly simulate a complete journey from point A to point B, managing dynamic elements such as vehicles, pedestrians, and traffic lights [1]. Group 2: Technical Integration - Achieving CitySim requires the integration of multiple technologies, including scene generation, agent behavior modeling, occlusion reasoning, dynamic scene generation, and environmental simulation [2]. - SceneDiffuser++ is the first end-to-end generative world model that consolidates these requirements through a single loss function, enabling complete simulation from A to B [2]. Group 3: Core Challenges and Innovations - Trip-level simulation faces three unique challenges compared to event-level simulation, including the need for dynamic agent management, occlusion reasoning, and environmental dynamics [3]. - SceneDiffuser++ introduces innovations such as multi-tensor diffusion, soft clipping strategies, and unified generative modeling to address these challenges [4][5]. Group 4: Methodology and Model Details - SceneDiffuser++ represents scenes as scene tensors, allowing the model to handle dynamic changes in heterogeneous elements like agents and traffic lights simultaneously [7]. - The model employs a diffusion process for training and inference, focusing on effective feature learning through loss masking and soft clipping to stabilize sparse tensor generation [8][9]. Group 5: Performance Evaluation - Experiments based on the WOMD-XLMap dataset demonstrate that SceneDiffuser++ outperforms previous models in all metrics, achieving lower Jensen-Shannon divergence values for agent generation and removal [12]. - The model maintains agent dynamics and traffic light realism over a 60-second simulation, contrasting with previous models that exhibited stagnation [15]. Group 6: Conclusion and Significance - The core contributions of SceneDiffuser++ include the introduction of the CitySim concept, the design of a unified generative framework, and the resolution of stability issues in dynamic scene generation through sparse tensor learning and soft clipping [19].
70K?端到端VLA现在这么吃香!?
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - End-to-end (E2E) autonomous driving is currently the core algorithm for mass production in intelligent driving, with significant advancements in the VLA (Vision-Language Architecture) and VLM (Vision-Language Model) systems, leading to high demand for related positions in the industry [2][4]. Summary by Sections Section 1: Background Knowledge - The course aims to provide a comprehensive understanding of end-to-end autonomous driving, including its historical development and the transition from modular to end-to-end approaches [21]. - Key technical stacks such as VLA, diffusion models, and reinforcement learning are essential for understanding the current landscape of autonomous driving technology [22]. Section 2: Job Market Insights - Positions related to VLA/VLM algorithms offer lucrative salaries, with 3-5 years of experience earning between 40K to 70K monthly, and top talents in the field can earn up to 1 million annually [10]. - The demand for VLA-related roles is increasing, indicating a shift in the industry towards advanced model architectures [9]. Section 3: Course Structure - The course is structured into five chapters, covering topics from basic concepts of end-to-end algorithms to advanced applications in VLA and reinforcement learning [19][30]. - Practical components are included to bridge the gap between theory and application, ensuring participants can implement learned concepts in real-world scenarios [18]. Section 4: Technical Innovations - Various approaches within end-to-end frameworks are explored, including two-stage and one-stage methods, with notable models like PLUTO and UniAD leading the way [4][23]. - The introduction of diffusion models has revolutionized trajectory prediction, allowing for better adaptability in uncertain driving environments [24]. Section 5: Learning Outcomes - Participants are expected to achieve a level of proficiency equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering key technologies and frameworks [32]. - The course emphasizes the importance of understanding BEV perception, multimodal models, and reinforcement learning to stay competitive in the evolving job market [32].
研究生入学,老板让手搓一辆自动驾驶小车。。。
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - The article introduces the "Black Warrior 001," a lightweight autonomous driving solution designed for educational and research purposes, highlighting its features and applications in various academic settings [3][6]. Group 1: Product Overview - The "Black Warrior 001" is a comprehensive autonomous driving platform that supports multiple functionalities including perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [3]. - The product is currently offered at a promotional price of 33,999 yuan, with a deposit of 1,000 yuan that can offset 2,000 yuan from the total price [2]. Group 2: Performance and Testing - The system has been tested in various environments such as indoor, outdoor, and parking scenarios, demonstrating its capabilities in perception, localization, fusion, and navigation planning [4]. - Specific tests include outdoor park driving, indoor 2D and 3D laser mapping, and slope testing, showcasing its versatility for both undergraduate and graduate research projects [6][9][10]. Group 3: Hardware Specifications - Key hardware components include: - 3D LiDAR: Mid 360 - 2D LiDAR: from Lidar Technology - Depth Camera: from Orbbec, equipped with IMU - Main Control Chip: Nvidia Orin NX 16G - Chassis System: Ackermann chassis [10][12]. - The vehicle weighs 30 kg, has a battery power of 50W, and offers a runtime of over 4 hours, with a maximum speed of 2 m/s [12]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, providing a one-click startup feature and a development environment for users [14]. - The system supports various functionalities such as 2D and 3D SLAM, vehicle navigation, and obstacle avoidance, making it suitable for a range of educational and research applications [15]. Group 5: After-Sales and Support - The company offers one year of after-sales support for non-human damage, with free repairs for damages caused by operational errors or code modifications during the warranty period [37].
还不知道研究方向?别人已经在卷VLA了......
自动驾驶之心· 2025-07-21 05:18
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, which present new opportunities for innovation and research in the field [1][2]. Group 1: VLA Research Topics - The VLA model aims to create an end-to-end autonomous driving system that maps raw sensor inputs directly to driving control commands, moving away from traditional modular architectures [2]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models enhance interpretability and reliability by allowing the system to explain its decision-making process in natural language, thus improving human trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - It includes a structured learning experience with a combination of online group research, paper guidance, and maintenance periods to ensure comprehensive understanding and application [6][8]. - Participants will gain insights into classic and cutting-edge papers, coding practices, and effective writing and submission strategies for academic papers [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and autonomous driving algorithms [5][9]. - Basic requirements include familiarity with Python and PyTorch, as well as access to high-performance computing resources [13][14]. - The course emphasizes academic integrity and provides a structured environment for learning and research [14][19]. Group 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [14]. - It is designed to ensure high academic standards and facilitate significant project outcomes, including a draft paper and project completion certificate [14][20]. - The course also includes a feedback mechanism to optimize the learning experience based on individual progress [14].
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].
秋招上岸小厂,心满意足了。。。
自动驾驶之心· 2025-07-20 12:47
Core Viewpoint - The article discusses the advancements in AI technology, particularly in autonomous driving and embodied intelligence, highlighting the saturation of the autonomous driving industry and the challenges faced by job seekers in this field [2]. Group 1: Industry Developments - The autonomous driving sector has seen significant breakthroughs, with L2 to L4 functionalities being mass-produced, alongside advancements in humanoid robots and quadrupedal robots [2]. - The industry has a clear demand for technology and talent, as evidenced by the experiences shared by job seekers [2]. Group 2: Job Seeking Platform - A new platform called AutoRobo Knowledge Community has been launched to assist job seekers in the fields of autonomous driving, embodied intelligence, and robotics, currently hosting nearly 1,000 members [2][3]. - The community includes members from various companies such as Horizon Robotics, Li Auto, Huawei, and Xiaomi, as well as students preparing for upcoming recruitment seasons [2]. Group 3: Resources and Support - The platform provides a wealth of resources including interview questions, industry reports, salary negotiation tips, and resume optimization services [3][4]. - Specific interview questions related to autonomous driving and embodied intelligence have been compiled, covering various technical aspects and practical skills [9][10][11]. Group 4: Industry Reports - The community offers access to numerous industry reports that help members understand the current state, development trends, and market opportunities within the autonomous driving and robotics sectors [15][19]. - Reports include insights on trajectory prediction, occupancy perception, and the overall landscape of the embodied intelligence industry [14][19].
面试了很多端到端候选人,还是有很多人搞不清楚。。。
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - End-to-End Autonomous Driving is a key algorithm for intelligent driving mass production, with significant salary potential for related positions, and it has evolved into various technical directions since the introduction of UniAD [2][4]. Group 1: Technical Directions - End-to-End Autonomous Driving can be categorized into one-stage and two-stage approaches, with various subfields emerging under each category [2][4]. - The core advantage of end-to-end systems is the direct modeling from sensor input to vehicle planning/control information, avoiding error accumulation seen in modular methods [2]. - Notable algorithms include PLUTO for two-stage end-to-end, UniAD for perception-based one-stage, OccWorld for world model-based one-stage, and DiffusionDrive for diffusion model-based one-stage [4]. Group 2: Industry Trends - The demand for VLA/VLM algorithm experts is increasing, with salary ranges for positions requiring 3-5 years of experience being between 40K-70K [9]. - The industry is witnessing a shift towards large model algorithms, with companies focusing on VLA as the next generation of autonomous driving solutions [8][9]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" is being offered to help individuals understand the complexities of end-to-end algorithms and their applications [15][28]. - The course covers various topics, including background knowledge, two-stage end-to-end, one-stage end-to-end, and practical applications of reinforcement learning [20][22][24]. - The course aims to provide a comprehensive understanding of the end-to-end framework, including key technologies like BEV perception, multi-modal large models, and diffusion models [31].
SpatialTrackerV2:开源前馈式可扩展的3D点追踪方法
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - The article discusses the development of SpatialTrackerV2, a state-of-the-art method for 3D point tracking from monocular video, which integrates video depth, camera ego motion, and object motion into a fully differentiable process for scalable joint training [7][37]. Group 1: Current Issues in 3D Point Tracking - 3D point tracking aims to recover long-term 3D trajectories of arbitrary points from monocular video, showing strong potential in various applications such as robotics and video generation [4]. - Existing solutions heavily rely on low/mid-level visual models, leading to high computational costs and limitations in scalability due to the need for real 3D trajectories as supervision [6][10]. Group 2: Proposed Solution - SpatialTrackerV2 - SpatialTrackerV2 decomposes 3D point tracking into three independent components: video depth, camera ego motion, and object motion, integrating them into a fully differentiable framework [7]. - The architecture includes a front-end for video depth estimation and camera pose initialization, and a back-end for joint motion optimization, utilizing a novel SyncFormer module to model correlations between 2D and 3D features [7][30]. Group 3: Performance Evaluation - The method achieved new state-of-the-art results on the TAPVid-3D benchmark, with scores of 21.2 AJ and 31.0 APD3D, representing improvements of 61.8% and 50.5% over the previous best [9]. - SpatialTrackerV2 demonstrated superior performance in video depth and camera pose consistency estimation, outperforming existing methods like MegaSAM and achieving approximately 50 times faster inference speed [9]. Group 4: Training and Optimization Process - The training process involves using consistency constraints between static and dynamic points for 3D tracking, allowing for effective optimization even with limited depth information [8][19]. - The model employs a bundle optimization approach to refine depth and camera pose estimates iteratively, incorporating various loss functions to ensure accuracy [24][26]. Group 5: Conclusion - SpatialTrackerV2 represents a significant advancement in 3D point tracking, providing a robust foundation for motion understanding in real-world scenarios and pushing towards "physical intelligence" through the exploration of large-scale visual data [37].
港中文最新!ReAL-AD:迈向类人推理的端到端自动驾驶,轨迹性能提升30%(ICCV'25)
自动驾驶之心· 2025-07-20 08:36
Core Insights - The article discusses the introduction of ReAL-AD, a reasoning-enhanced learning framework for end-to-end autonomous driving, which aims to align decision-making processes with human cognitive models [2][8][40]. Group 1: Framework Overview - ReAL-AD integrates a three-layer human cognitive model (driving strategy, driving decision, and driving operation) into the decision-making process of autonomous driving [2][8]. - The framework includes three main components: 1. Strategic Reasoning Injector, which formulates high-level driving strategies from complex traffic insights generated by visual-language models (VLMs) [8][20]. 2. Tactical Reasoning Integrator, which refines driving intentions into interpretable driving choices [8][20]. 3. Hierarchical Trajectory Decoder, which translates driving decisions into precise control actions for smooth and human-like trajectory execution [8][20]. Group 2: Performance Evaluation - Extensive evaluations on the NuScenes and Bench2Drive datasets demonstrate that ReAL-AD improves planning accuracy and safety by over 30% compared to baseline methods [9][34]. - The method reduces L2 error by 33% and collision rates by 32%, indicating significant enhancements in trajectory accuracy and driving safety [9][34]. Group 3: Comparison with Existing Methods - Existing end-to-end autonomous driving methods often rely on fixed and sparse trajectory supervision, which limits their ability to replicate the structured cognitive reasoning processes of human drivers [3][10]. - ReAL-AD addresses these limitations by embedding structured multi-stage reasoning into the decision-making hierarchy, enhancing generalization capabilities in diverse real-world scenarios [5][10]. Group 4: Experimental Results - The framework outperforms other state-of-the-art methods, achieving the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the NuScenes dataset [34]. - In closed-loop evaluations, the integration of ReAL-AD significantly improves driving scores and success rates, demonstrating its effectiveness in real-world applications [34].