Workflow
自动驾驶之心
icon
Search documents
小鹏超视距自动驾驶VLA是如何实现的?
自动驾驶之心· 2025-08-25 23:34
Core Viewpoint - The article discusses the development of NavigScene, a novel dataset and methodology by Xiaopeng Motors and the University of Central Florida, aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities in complex environments [3][9][10]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving systems that primarily rely on immediate visual information [3][5]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data of multi-view sensor inputs and corresponding natural language navigation instructions [9][14]. Group 2: Methodologies - Three complementary methodologies are proposed to utilize NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][20]. 2. Navigation-guided preference optimization (NPO) improves the generalization of visual-language models in new navigation scenarios [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance in perception, prediction, and planning tasks [27][29]. Group 3: Experimental Results - Experiments demonstrate that integrating NavigScene significantly improves the performance of visual-language models in various driving-related tasks, including reasoning and planning [31][35]. - The results indicate that the combination of NSFT and NPO leads to notable enhancements in the models' ability to handle complex driving scenarios, reducing collision rates and improving trajectory accuracy [43][47].
末9硕双非本,现在有些迷茫。。。
自动驾驶之心· 2025-08-25 23:34
Core Viewpoint - The article emphasizes the importance of choosing a promising direction in the field of autonomous driving and robotics, highlighting the need for continuous learning and adaptation to industry trends [1][2]. Group 1: Industry Trends and Opportunities - The autonomous driving industry is still vibrant and offers numerous opportunities despite concerns about job saturation in traditional control systems [2][3]. - The community "Autonomous Driving Heart" aims to create a comprehensive platform for knowledge sharing, technical discussions, and job opportunities in the autonomous driving sector, with a target of reaching nearly 10,000 members in two years [2][3][19]. - The community provides access to over 40 technical routes and invites industry experts to answer questions, facilitating knowledge transfer and networking [3][19]. Group 2: Learning and Development Resources - The community offers a variety of resources, including video content, learning paths, and practical problem-solving discussions, to help both beginners and advanced learners in the field of autonomous driving [2][3][19]. - A detailed compilation of over 60 datasets related to autonomous driving is available, covering various aspects such as perception and trajectory prediction [29]. - The community has organized numerous live sessions with industry leaders, providing insights into the latest technologies and methodologies in autonomous driving [55]. Group 3: Job Opportunities and Networking - The community has established a job referral mechanism with multiple autonomous driving companies, facilitating direct connections between job seekers and potential employers [10][18]. - Regular job postings and sharing of internship opportunities are part of the community's offerings, helping members stay informed about the latest openings in the industry [26][18]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced professionals in the field [58][59].
从理想VLA看自动驾驶技术演进路线...
自动驾驶之心· 2025-08-25 11:29
Core Insights - The article discusses the advancements in the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are essential for autonomous driving [1][3] - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [3] Summary by Sections VLA Model Capabilities - The VLA model enhances semantic understanding through multimodal input, excels in reasoning with a thought chain approach, and closely mimics human driving intuition via trajectory planning [1] - It possesses four core abilities: spatial understanding, reasoning, communication and memory, and behavioral capabilities [1] Research and Development Focus - The academic community is increasingly shifting towards large models and VLA, while traditional perception and planning tasks are still being optimized in the industry [3] - There is a growing interest in VLA, with many students seeking guidance on research papers related to this area, indicating a significant opportunity for academic contributions [3] Course Structure and Offerings - A structured course is being offered to help students systematically grasp key theoretical knowledge and develop practical skills in VLA research [5][12] - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, culminating in a maintenance period for ongoing support [13][33] Enrollment and Requirements - The program is limited to 6-8 participants per session, targeting individuals with a background in deep learning and basic knowledge of autonomous driving algorithms [11][20] - Participants are expected to have a foundational understanding of Python and PyTorch, with access to high-performance computing resources recommended [20] Learning Outcomes - Students will gain insights into classic and cutting-edge research papers, develop coding skills, and receive guidance on writing and submitting academic papers [19][33] - The course aims to produce a draft of a research paper as a tangible outcome of the learning experience [19][33]
正式结课!动静态/OCC/端到端自动标注一网打尽
自动驾驶之心· 2025-08-25 03:15
Core Viewpoint - The article emphasizes the increasing investment in automatic labeling by autonomous driving companies, highlighting the challenges and complexities involved in 4D automatic labeling, which integrates 3D spatial data with temporal dimensions [1][2]. Group 1: Challenges in Automatic Labeling - The main difficulties in 4D automatic labeling include high requirements for temporal consistency, complex multi-modal data fusion, challenges in generalizing dynamic scenes, conflicts between labeling efficiency and cost, and high demands for scene generalization in mass production [2][3]. Group 2: Course Overview - The course offers a comprehensive tutorial on the entire process of 4D automatic labeling, covering core algorithms and practical applications, aimed at enhancing algorithmic capabilities through real-world examples [2][3][4]. - Key topics include dynamic obstacle detection, SLAM reconstruction principles, static element labeling based on reconstruction graphs, and the mainstream paradigms of end-to-end labeling [3][4][5][6]. Group 3: Detailed Course Structure - Chapter 1 introduces the basics of 4D automatic labeling, its applications, required data, and algorithms involved, focusing on system time-space synchronization and sensor calibration [4]. - Chapter 2 delves into the process of dynamic obstacle labeling, covering offline 3D target detection algorithms and practical solutions to common engineering challenges [6]. - Chapter 3 focuses on laser and visual SLAM reconstruction, discussing its importance and the basic modules of reconstruction algorithms [7]. - Chapter 4 addresses the automation of static element labeling, emphasizing the need for accurate detection and tracking [9]. - Chapter 5 centers on the OCC labeling of general obstacles, detailing the input-output requirements and the processes for generating ground truth [10]. - Chapter 6 is dedicated to end-to-end ground truth generation, integrating various elements into a cohesive process [12]. - Chapter 7 discusses the data closed-loop topic, sharing insights on industry pain points and interview preparation for relevant positions [14]. Group 4: Target Audience and Course Benefits - The course is designed for researchers, students, and professionals looking to deepen their understanding of 4D automatic labeling and enhance their algorithm development capabilities [19][23]. - Participants will gain practical skills in 4D automatic labeling, including knowledge of cutting-edge algorithms and the ability to solve real-world problems [19].
某头部tire1被央企主机厂控股投资事宜确定~
自动驾驶之心· 2025-08-24 23:32
Core Viewpoint - The article discusses the strategic investment and control of a leading autonomous driving algorithm provider, referred to as Company Z, by a central state-owned enterprise, indicating a significant shift in the competitive landscape of the autonomous driving industry. Group 1: Investment and Control - Company Z has confirmed its strategic investment and control by a central state-owned enterprise, with the approval from relevant departments pending official announcement [4]. - The investment signifies that Company Z will officially join the "national team," gaining access to a broader customer base and substantial financial resources [5]. Group 2: Competitive Landscape - Company Z's entry into the Horizon ecosystem is expected to be a major benefit for Horizon, as Z is recognized for its strong engineering capabilities in mid-to-low computing power chip platforms [6]. - The competition is intensifying, with Company Z emerging as a formidable competitor to existing algorithm providers, particularly impacting the IPO prospects of another key player, Company QZ [6]. Group 3: Industry Trends - The autonomous driving sector is evolving into a large-scale industrial operation, requiring significant resources in terms of personnel, data, and technology, moving away from small entrepreneurial teams [6]. - Collaborations between major players and algorithm providers are becoming essential for competitiveness, as seen with various partnerships in the industry [6].
从零开始!自动驾驶端到端与VLA学习路线图~
自动驾驶之心· 2025-08-24 23:32
Core Viewpoint - The article emphasizes the importance of understanding end-to-end (E2E) algorithms and Visual Language Models (VLA) in the context of autonomous driving, highlighting the rapid development and complexity of the technology stack involved [2][32]. Summary by Sections Introduction to End-to-End and VLA - The article discusses the evolution of large language models over the past five years, indicating a significant technological advancement in the field [2]. Technical Foundations - The Transformer architecture is introduced as a fundamental component for understanding large models, with a focus on attention mechanisms and multi-head attention [8][12]. - Tokenization methods such as BPE (Byte Pair Encoding) and positional encoding are explained as essential for processing sequences in models [13][9]. Course Overview - A new course titled "End-to-End and VLA Autonomous Driving" is launched, aimed at providing a comprehensive understanding of the technology stack and practical applications in autonomous driving [21][33]. - The course is structured into five chapters, covering topics from basic E2E algorithms to advanced VLA methods, including practical assignments [36][48]. Key Learning Objectives - The course aims to equip participants with the ability to classify research papers, extract innovative points, and develop their own research frameworks [34]. - Emphasis is placed on the integration of theory and practice, ensuring that learners can apply their knowledge effectively [35]. Industry Demand and Career Opportunities - The demand for VLA/VLM algorithm experts is highlighted, with salary ranges between 40K to 70K for positions requiring 3-5 years of experience [29]. - The course is positioned as a pathway for individuals looking to transition into roles focused on autonomous driving algorithms, particularly in the context of emerging technologies [28].
自动驾驶转具身智能有哪些切入点?
自动驾驶之心· 2025-08-24 23:32
Core Viewpoint - The article discusses the transition from autonomous driving to embodied intelligence, highlighting the similarities and differences in algorithms and tasks between the two fields [1]. Group 1: Algorithm and Task Comparison - Embodied intelligence largely continues the algorithms used in robotics and autonomous driving, such as training and fine-tuning methods, as well as large models [1]. - There are notable differences in specific tasks, including data collection methods and the emphasis on execution hardware and structure [1]. Group 2: Community and Learning Resources - A full-stack learning community named "Embodied Intelligence Heart" has been established to share knowledge related to algorithms, data collection, and hardware solutions in the field of embodied intelligence [1]. - Key areas of focus within the community include VLA, VLN, Diffusion Policy, reinforcement learning, robotic arm grasping, pose estimation, robot simulation, multimodal large models, chip deployment, sim2real, and robot hardware structure [1].
超越一众SOTA!华为MoVieDrive:自动驾驶环视多模态场景生成最新世界模型~
自动驾驶之心· 2025-08-24 23:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享 华为诺亚和多伦多大学最新的工作— MoVieDrive ! 自动驾驶环视多模态场景生成最新算法,超越 CogVideoX等一众SOTA。 如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群加入,也欢迎添加小助理微信AIDriver005做进一步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Guile Wu等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 近年来,视频生成在自动驾驶领域的城市场景合成中展现出优越性。现有的自动驾驶视频生成方法主要集中在RGB视频生成上,缺乏支持多模态视频生成的能力。 然而多模态数据(如深度图和语义图)对于自动驾驶中的整体城市场景理解至关重要。虽然可以使用多个模型来生成不同的模态,但这会增加模型部署的难度,并 且无法利用多模态数据生成的互补线索。为了解决这个问题,本文提出了一种全新的面向自动驾驶的多模态环视视频生成方法。具体而言,我们构建了一个由 模 态共享组件 和 模态特定组件 组成的统一扩散T ...
某新势力智驾团队最后一位留守高管已于近日离职
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The departure of key personnel from a leading new force car company's intelligent driving team may significantly impact its research and development progress, team stability, and sales momentum in the second half of the year [1][2][3]. Group 1: Company Developments - The intelligent driving team of the new force car company has experienced significant turnover, with a reported attrition rate exceeding 50% in some teams this year [1]. - The company has initiated a full-scale non-compete agreement to retain talent, even requiring recent graduates to sign such agreements [1]. - The departure of the R&D head, who was a core member of the team, raises concerns about the company's ability to achieve its ambitious goals for 2024 [2]. Group 2: Industry Trends - The movement of core intelligent driving talent across the industry may present new opportunities for technological advancements [3]. - The intelligent driving landscape is evolving, with a trend towards convergence in technology routes driven by competitive pricing strategies [3]. - The departure of key figures from various intelligent driving teams, including those from Xiaopeng and NIO, indicates a broader industry shift and a new cycle of updates within the intelligent driving teams [3]. Group 3: Strategic Implications - The company is expected to launch a new paradigm of intelligent driving, which could significantly influence the sales of new models [2]. - The loss of three high-level executives responsible for critical aspects of intelligent driving may disrupt the company's overall R&D timeline and stability [2].
7DGS 炸场:一秒点燃动态世界!真实感实时渲染首次“七维全开”
自动驾驶之心· 2025-08-23 16:03
Core Insights - The article introduces 7D Gaussian Splatting (7DGS), a novel framework for real-time rendering of dynamic scenes that unifies spatial, temporal, and angular dimensions into a single 7D Gaussian representation [2][44] - The method addresses the challenges of modeling complex visual effects related to perspective, time dynamics, and spatial geometry, which are crucial for applications in virtual reality, augmented reality, and digital twins [3][44] Technical Contributions - 7DGS models scene elements as 7D Gaussians, capturing the interdependencies between geometry, dynamics, and appearance, allowing for accurate modeling of phenomena like moving specular highlights and anisotropic reflections [3][10] - The framework includes an efficient conditional slicing mechanism that projects the high-dimensional Gaussian representation into a format compatible with existing real-time rendering processes, ensuring both efficiency and fidelity [10][38] - Experimental results demonstrate that 7DGS outperforms previous methods, achieving a peak signal-to-noise ratio (PSNR) improvement of up to 7.36 dB while maintaining rendering speeds exceeding 400 frames per second (FPS) [10][44] Methodology - The 7D Gaussian representation is defined to encode spatial, temporal, and directional attributes, allowing for a comprehensive modeling of complex dependencies across these dimensions [18][19] - The article details a conditional slicing mechanism that enables efficient integration of temporal dynamics and perspective effects into traditional 3D rendering workflows [23][31] - An adaptive Gaussian refinement technique is introduced to dynamically update Gaussian parameters, enhancing the representation of complex dynamic behaviors such as non-rigid deformations [32][36] Experimental Evaluation - The framework was evaluated across multiple datasets, including heart scans and dynamic cloud simulations, with metrics such as PSNR, structural similarity index (SSIM), and rendering speed reported [39][41] - Results indicate that 7DGS achieves superior image quality and efficiency compared to existing techniques, reinforcing its potential for advancing dynamic scene rendering in the industry [44]