Workflow
自动驾驶之心
icon
Search documents
小鹏最新!NavigScene:全局导航实现超视距自动驾驶VLA(ACMMM'25)
自动驾驶之心· 2025-07-14 11:30
Core Insights - The article discusses the development of NavigScene, a novel dataset aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities [2][12][14]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving models that primarily rely on immediate visual information [5][9]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data to facilitate comprehensive scene understanding and decision-making [9][14]. Group 2: Methodologies - Three complementary paradigms are proposed to leverage NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][19]. 2. Navigation-guided preference optimization (NPO) improves generalization in new scenarios through reinforcement learning [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance [27][28]. Group 3: Experimental Results - Experiments demonstrate that integrating global navigation knowledge significantly improves the performance of autonomous driving systems in tasks such as perception, prediction, and planning [12][34][39]. - The results indicate that models trained with NavigScene outperform baseline models across various metrics, including BLEU-4, METEOR, and CIDEr, showcasing enhanced reasoning capabilities [32][34]. Group 4: Practical Implications - The integration of NavigScene allows autonomous systems to make more informed decisions in complex driving environments, leading to improved safety and reliability [12][42]. - The findings highlight the importance of incorporating beyond-visual-range (BVR) knowledge for effective navigation and planning in autonomous driving applications [8][12].
ICCV25!百度U-Vilar:视觉定位多任务SOTA,无痛兼容端到端框架~
自动驾驶之心· 2025-07-14 11:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 百度最新的工作! U-ViLAR:基于可微分关联配准 的不确定性感知视觉定位框架! 如果您有相关工作需要分享,请在文末联系我 们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 引言 在城市环境中,建筑物、隧道等障碍物会严重干扰 GNSS(全球导航卫星系统)信号,使得依赖 GNSS 的 定位不够可靠。因此,视觉定位技术在这类场景中显得尤为关键。 传统方法依赖于图像与三维地图的特征匹配,但对视角和光照变化敏感,且大规模三维地图的构建成本 高。主流基于神经网络的端到端定位方法往往会在不同初始范围和定位精度之间权衡,且其精度由视觉感 知的误差和地图匹配本身的误差耦合决定。 作为人类决策系统或自动驾驶端到端规划系统的关键部分,需要支持不同格式的地图并在不同范围内实现 精确定位,且能解耦输出感知不确定度和定位不确定度,用于修正定位精度以及减少在决策系统内的误差 传递。 U-VILAR 通过对感知 ...
VLA之外,具身+VA工作汇总
自动驾驶之心· 2025-07-14 10:36
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robot learning and performance in real-world tasks [2][3][4]. Group 1: 2025 Research Highlights - Numerous projects are set for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic capabilities in manipulation and interaction [2]. - The "BEHAVIOR Robot Suite" aims to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotic technology [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for robots to learn complex tasks from minimal demonstrations, showcasing advancements in imitation learning [2]. Group 2: Methodological Innovations - The article discusses various innovative methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning," which aims to improve the adaptability of robots in different environments [2]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" highlights the focus on enhancing dexterity in robotic hands, crucial for complex manipulation tasks [4]. - "Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation" indicates a trend towards using synthetic data to train robots, which can significantly reduce the need for real-world data collection [7]. Group 3: Future Directions - The research agenda for 2024 and beyond includes projects like "Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching," which suggests a shift towards utilizing advanced data representations for improved learning outcomes [9]. - "Zero-Shot Framework from Image Generation World Model to Robotic Manipulation" indicates a future direction where robots can generalize from visual data without prior specific training, enhancing their versatility [9]. - The emphasis on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" reflects a growing interest in leveraging human demonstrations to improve robotic learning efficiency [7].
从BEV到端到端,谈谈自动驾驶数据闭环的核心~
自动驾驶之心· 2025-07-14 10:36
Core Viewpoint - The article emphasizes the importance of high-quality data sets for autonomous driving, highlighting the need for efficient and low-cost methods to obtain these data sets through advanced 4D labeling techniques [1][2]. Group 1: Importance of 4D Labeling - The demand for automated 4D labeling is increasing due to the growing complexity of autonomous driving scenarios, which require precise tracking of dynamic and static elements [1][3]. - Automated labeling algorithms are crucial for generating high-precision ground truth data, which can optimize results using full temporal data without being limited by vehicle computing power [1][2]. Group 2: Challenges in Automated Labeling - Key challenges in 4D automated labeling include maintaining high spatial-temporal consistency, complex multi-modal data fusion, and ensuring model generalization across various driving conditions [2][3]. - The industry faces significant pain points such as sensor calibration, occlusion handling, and the need for high-quality automated labeling results [2][3]. Group 3: Course Offerings - The article introduces a course titled "Automated Driving 4D Labeling Employment Class," which aims to address the challenges of entering the field and optimizing advanced learning [2][4]. - The course covers the entire process of 4D automated labeling, including dynamic and static object labeling, occupancy labeling, and end-to-end labeling methodologies [2][4]. Group 4: Course Structure - The course is structured into several chapters, each focusing on different aspects of 4D automated labeling, such as dynamic object detection, SLAM reconstruction, and static element labeling [3][4][5]. - Practical exercises are included in each chapter to enhance understanding and application of the concepts taught [4][5]. Group 5: Target Audience - The course is designed for individuals interested in deepening their knowledge in the autonomous driving data loop, including researchers, students, and professionals looking to transition into this field [18][19].
还在纠结是否入门大模型?别人已经发了第一篇顶会!
自动驾驶之心· 2025-07-14 06:20
Core Viewpoint - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware adaptation, knowledge distillation, and advanced reasoning paradigms like CoT and VLA+ reinforcement learning as key areas for future development [1][2]. Group 1: Course Introduction - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2]. - It addresses the core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms [3]. Group 2: Problems Addressed by the Course - The course provides a systematic understanding of large model knowledge, helping students build a coherent theoretical framework [3]. - It assists students in combining theoretical knowledge with practical coding skills, enabling them to replicate research papers and develop new models [3]. - The course offers guidance on writing and submitting academic papers, addressing common challenges faced by students [3]. Group 3: Enrollment Information - The course limits enrollment to 6-8 students per session [4]. - It targets individuals with a background in deep learning or machine learning, familiarity with Python, and a passion for research [6]. Group 4: Course Outcomes - Participants will gain insights into classic and cutting-edge papers in the field, enhancing their understanding of key algorithms and principles [9]. - The course includes a structured approach to writing and revising academic papers, culminating in the production of a draft [9]. Group 5: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance and a 10-week maintenance period [9]. - It covers various topics, including model pruning, quantization, and advanced reasoning techniques, with a focus on practical applications [19].
推荐几个PNC和端到端岗位(待遇丰厚)
自动驾驶之心· 2025-07-14 06:20
Group 1 - The article discusses job opportunities in the autonomous driving sector, specifically for positions related to end-to-end and traditional control algorithms at a leading self-driving supplier [1] - Positions mentioned include Autonomous Driving Control Algorithm Engineer/PNC Expert with a salary range of 40k-100k/month and End-to-End/VLA Engineer with a salary range of 30k-80k/month [2][4] - The article highlights the responsibilities and requirements for various roles, emphasizing the need for advanced degrees and proficiency in programming languages such as C++ and Python, as well as familiarity with control algorithms and machine learning techniques [5][10] Group 2 - The article mentions a community called AutoRobo Knowledge Planet, which serves as a platform for job seekers in autonomous driving and embodied intelligence, currently hosting nearly 1000 members from various companies [11] - It outlines the internal resources available to members, including interview questions, industry reports, salary negotiation tips, and job referrals [13][14] - The community also provides insights into the autonomous driving industry, including trends, market opportunities, and research reports on embodied intelligence [23][24]
地平线、滴滴出行2026届校园招聘正式开启!
自动驾驶之心· 2025-07-13 13:18
Core Viewpoint - The article highlights the ongoing recruitment activities in the autonomous driving sector, indicating a strong demand for various technical roles, particularly in perception, control, and algorithm development, as companies prepare for the upcoming hiring season in late July and early August [2]. Recruitment Opportunities - Numerous companies, including Horizon Robotics, Didi, and Yuanrong Qixing, are opening recruitment for the 2026 class, with a variety of positions available in hardware, software, and algorithm development [3][4]. - Specific roles mentioned include hardware development engineers, perception engineers, middleware software engineers, planning control algorithm engineers, and safety algorithm engineers, with multiple openings across major cities like Beijing, Shanghai, and Guangzhou [3][4]. Community and Resources - The AutoRobo Knowledge Circle serves as a community for job seekers in the fields of autonomous driving and embodied intelligence, providing resources such as interview questions, experience sharing, industry reports, and resume optimization services [8][9]. - The community has nearly 1,000 members, including professionals from leading companies in the industry, facilitating networking and knowledge exchange [8]. Interview Preparation - The article emphasizes the importance of thorough preparation for interviews, suggesting candidates highlight their strengths in resumes and practice extensively before interviews to avoid missed opportunities [2]. - A collection of 100 interview questions related to autonomous driving and embodied intelligence is available within the community, aiding candidates in their preparation [12][13]. Industry Insights - The article mentions various industry reports available within the community, covering topics such as the development trends and market opportunities in the embodied intelligence sector, as well as specific reports on humanoid robots and their production [18]. - Insights into successful and unsuccessful interview experiences are shared, providing valuable lessons for candidates navigating the job market [20].
面试了很多端到端候选人,发现还是有很多人搞不清楚。。。
自动驾驶之心· 2025-07-13 13:18
Core Viewpoint - End-to-End Autonomous Driving is a key algorithm for intelligent driving mass production, with significant salary potential for related positions, and it has evolved into various technical branches since the introduction of UniAD [2] Group 1: Overview of End-to-End Autonomous Driving - End-to-End Autonomous Driving can be categorized into one-stage and two-stage approaches, with the core advantage being direct modeling from sensor input to vehicle planning/control, avoiding error accumulation seen in modular methods [2] - The emergence of BEV perception has bridged gaps between modular methods, leading to a significant technological leap [2] - The academic and industrial focus on End-to-End technology has raised questions about whether UniAD is the ultimate solution, indicating ongoing developments in various algorithms [2] Group 2: Challenges in Learning - The rapid development of End-to-End technology has made previous solutions inadequate, necessitating knowledge in multimodal large models, BEV perception, reinforcement learning, visual transformers, and diffusion models [4] - Beginners often struggle with the fragmented nature of knowledge and the overwhelming number of papers, leading to challenges in extracting frameworks and understanding industry trends [4] Group 3: Course Features - The newly developed course on End-to-End and VLA Autonomous Driving aims to address learning challenges by providing a structured approach to mastering core technologies [5] - The course emphasizes Just-in-Time Learning, helping students quickly grasp key concepts and expand their knowledge in specific areas [5] - It aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points [6] Group 4: Course Outline - The course includes chapters on the introduction to End-to-End algorithms, background knowledge, two-stage End-to-End methods, one-stage End-to-End methods, and practical applications [11][12][13] - Key topics include the evolution of End-to-End methods, the significance of BEV perception, and the latest advancements in VLA [9][14] Group 5: Target Audience and Expected Outcomes - The course is designed for individuals aiming to enter the autonomous driving industry, providing a comprehensive understanding of End-to-End technologies [19] - Upon completion, participants are expected to achieve a level equivalent to one year of experience as an End-to-End Autonomous Driving algorithm engineer, mastering various methodologies and key technologies [22]
三星最新MoSE:专为自驾Corner Case设计的MoE,直接SOTA!
自动驾驶之心· 2025-07-13 13:18
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 三星中国研究院&DS AI中心 最新的工作! MoSE: 面向自动驾驶的 Skill-by-Skill 混合专家学习框架!难例场景新SOTA。 如果您有 相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | LU XU等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 近期研究表明,使用网络规模数据训练的大型语言模型(LLMs)和视觉语言模型(VLMs)能够增强端到端自 动驾驶系统的泛化能力和解释能力。具体而言,通过动态地将输入路由到参数的专业子集,专家混合(MoE)技 术使得通用的LLM或VLM在保持计算效率的同时实现了显著的性能提升。 然而,一般的MoE模型通常需要大量的训练数据和复杂的优化过程。在这项工作中,受人类驾驶员学习过程的启 发,我们提出了一种面向技能的MoE方法,称为MoSE,它模拟了人类驾驶员的学习和推理过程,逐技能、 ...
为什么行业如此痴迷于强化学习?
自动驾驶之心· 2025-07-13 13:18
Core Viewpoint - The article discusses a significant research paper that explores the effectiveness of reinforcement learning (RL) compared to supervised fine-tuning (SFT) in training AI models, particularly focusing on the concept of generalization and transferability of knowledge across different tasks [1][5][14]. Group 1: Training Methods - There are two primary methods for training AI models: imitation (SFT) and exploration (RL) [2][3]. - Imitation learning involves training models to replicate data, while exploration allows models to discover solutions independently, assuming they have a non-random chance of solving problems [3][6]. Group 2: Generalization and Transferability - The core of the research is the concept of generalization, where SFT may hinder the ability to adapt known knowledge to unknown domains, while RL promotes better transferability [5][7]. - A Transferability Index (TI) was introduced to measure the ability to transfer skills across tasks, revealing that RL-trained models showed positive transfer in various reasoning tasks, while SFT models often exhibited negative transfer in non-reasoning tasks [7][8]. Group 3: Experimental Findings - The study conducted rigorous experiments comparing RL and SFT models, finding that RL models improved performance in unrelated fields, while SFT models declined in non-mathematical areas despite performing well in mathematical tasks [10][14]. - The results indicated that RL models maintained a more stable internal knowledge structure, allowing them to adapt better to new domains without losing foundational knowledge [10][14]. Group 4: Implications for AI Development - The findings suggest that while imitation learning has been a preferred method, reinforcement learning offers a promising approach for developing intelligent systems capable of generalizing knowledge across various fields [14][15]. - The research emphasizes that true intelligence in AI involves the ability to apply learned concepts to new situations, akin to human learning processes [14][15].