Workflow
自动驾驶之心
icon
Search documents
研究生入学,老板让手搓一辆自动驾驶小车。。。
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - The article introduces the "Black Warrior 001," a lightweight autonomous driving solution designed for educational and research purposes, highlighting its features and applications in various academic settings [3][6]. Group 1: Product Overview - The "Black Warrior 001" is a comprehensive autonomous driving platform that supports multiple functionalities including perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [3]. - The product is currently offered at a promotional price of 33,999 yuan, with a deposit of 1,000 yuan that can offset 2,000 yuan from the total price [2]. Group 2: Performance and Testing - The system has been tested in various environments such as indoor, outdoor, and parking scenarios, demonstrating its capabilities in perception, localization, fusion, and navigation planning [4]. - Specific tests include outdoor park driving, indoor 2D and 3D laser mapping, and slope testing, showcasing its versatility for both undergraduate and graduate research projects [6][9][10]. Group 3: Hardware Specifications - Key hardware components include: - 3D LiDAR: Mid 360 - 2D LiDAR: from Lidar Technology - Depth Camera: from Orbbec, equipped with IMU - Main Control Chip: Nvidia Orin NX 16G - Chassis System: Ackermann chassis [10][12]. - The vehicle weighs 30 kg, has a battery power of 50W, and offers a runtime of over 4 hours, with a maximum speed of 2 m/s [12]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, providing a one-click startup feature and a development environment for users [14]. - The system supports various functionalities such as 2D and 3D SLAM, vehicle navigation, and obstacle avoidance, making it suitable for a range of educational and research applications [15]. Group 5: After-Sales and Support - The company offers one year of after-sales support for non-human damage, with free repairs for damages caused by operational errors or code modifications during the warranty period [37].
还不知道研究方向?别人已经在卷VLA了......
自动驾驶之心· 2025-07-21 05:18
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, which present new opportunities for innovation and research in the field [1][2]. Group 1: VLA Research Topics - The VLA model aims to create an end-to-end autonomous driving system that maps raw sensor inputs directly to driving control commands, moving away from traditional modular architectures [2]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models enhance interpretability and reliability by allowing the system to explain its decision-making process in natural language, thus improving human trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - It includes a structured learning experience with a combination of online group research, paper guidance, and maintenance periods to ensure comprehensive understanding and application [6][8]. - Participants will gain insights into classic and cutting-edge papers, coding practices, and effective writing and submission strategies for academic papers [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and autonomous driving algorithms [5][9]. - Basic requirements include familiarity with Python and PyTorch, as well as access to high-performance computing resources [13][14]. - The course emphasizes academic integrity and provides a structured environment for learning and research [14][19]. Group 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [14]. - It is designed to ensure high academic standards and facilitate significant project outcomes, including a draft paper and project completion certificate [14][20]. - The course also includes a feedback mechanism to optimize the learning experience based on individual progress [14].
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].
秋招上岸小厂,心满意足了。。。
自动驾驶之心· 2025-07-20 12:47
Core Viewpoint - The article discusses the advancements in AI technology, particularly in autonomous driving and embodied intelligence, highlighting the saturation of the autonomous driving industry and the challenges faced by job seekers in this field [2]. Group 1: Industry Developments - The autonomous driving sector has seen significant breakthroughs, with L2 to L4 functionalities being mass-produced, alongside advancements in humanoid robots and quadrupedal robots [2]. - The industry has a clear demand for technology and talent, as evidenced by the experiences shared by job seekers [2]. Group 2: Job Seeking Platform - A new platform called AutoRobo Knowledge Community has been launched to assist job seekers in the fields of autonomous driving, embodied intelligence, and robotics, currently hosting nearly 1,000 members [2][3]. - The community includes members from various companies such as Horizon Robotics, Li Auto, Huawei, and Xiaomi, as well as students preparing for upcoming recruitment seasons [2]. Group 3: Resources and Support - The platform provides a wealth of resources including interview questions, industry reports, salary negotiation tips, and resume optimization services [3][4]. - Specific interview questions related to autonomous driving and embodied intelligence have been compiled, covering various technical aspects and practical skills [9][10][11]. Group 4: Industry Reports - The community offers access to numerous industry reports that help members understand the current state, development trends, and market opportunities within the autonomous driving and robotics sectors [15][19]. - Reports include insights on trajectory prediction, occupancy perception, and the overall landscape of the embodied intelligence industry [14][19].
面试了很多端到端候选人,还是有很多人搞不清楚。。。
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - End-to-End Autonomous Driving is a key algorithm for intelligent driving mass production, with significant salary potential for related positions, and it has evolved into various technical directions since the introduction of UniAD [2][4]. Group 1: Technical Directions - End-to-End Autonomous Driving can be categorized into one-stage and two-stage approaches, with various subfields emerging under each category [2][4]. - The core advantage of end-to-end systems is the direct modeling from sensor input to vehicle planning/control information, avoiding error accumulation seen in modular methods [2]. - Notable algorithms include PLUTO for two-stage end-to-end, UniAD for perception-based one-stage, OccWorld for world model-based one-stage, and DiffusionDrive for diffusion model-based one-stage [4]. Group 2: Industry Trends - The demand for VLA/VLM algorithm experts is increasing, with salary ranges for positions requiring 3-5 years of experience being between 40K-70K [9]. - The industry is witnessing a shift towards large model algorithms, with companies focusing on VLA as the next generation of autonomous driving solutions [8][9]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" is being offered to help individuals understand the complexities of end-to-end algorithms and their applications [15][28]. - The course covers various topics, including background knowledge, two-stage end-to-end, one-stage end-to-end, and practical applications of reinforcement learning [20][22][24]. - The course aims to provide a comprehensive understanding of the end-to-end framework, including key technologies like BEV perception, multi-modal large models, and diffusion models [31].
SpatialTrackerV2:开源前馈式可扩展的3D点追踪方法
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - The article discusses the development of SpatialTrackerV2, a state-of-the-art method for 3D point tracking from monocular video, which integrates video depth, camera ego motion, and object motion into a fully differentiable process for scalable joint training [7][37]. Group 1: Current Issues in 3D Point Tracking - 3D point tracking aims to recover long-term 3D trajectories of arbitrary points from monocular video, showing strong potential in various applications such as robotics and video generation [4]. - Existing solutions heavily rely on low/mid-level visual models, leading to high computational costs and limitations in scalability due to the need for real 3D trajectories as supervision [6][10]. Group 2: Proposed Solution - SpatialTrackerV2 - SpatialTrackerV2 decomposes 3D point tracking into three independent components: video depth, camera ego motion, and object motion, integrating them into a fully differentiable framework [7]. - The architecture includes a front-end for video depth estimation and camera pose initialization, and a back-end for joint motion optimization, utilizing a novel SyncFormer module to model correlations between 2D and 3D features [7][30]. Group 3: Performance Evaluation - The method achieved new state-of-the-art results on the TAPVid-3D benchmark, with scores of 21.2 AJ and 31.0 APD3D, representing improvements of 61.8% and 50.5% over the previous best [9]. - SpatialTrackerV2 demonstrated superior performance in video depth and camera pose consistency estimation, outperforming existing methods like MegaSAM and achieving approximately 50 times faster inference speed [9]. Group 4: Training and Optimization Process - The training process involves using consistency constraints between static and dynamic points for 3D tracking, allowing for effective optimization even with limited depth information [8][19]. - The model employs a bundle optimization approach to refine depth and camera pose estimates iteratively, incorporating various loss functions to ensure accuracy [24][26]. Group 5: Conclusion - SpatialTrackerV2 represents a significant advancement in 3D point tracking, providing a robust foundation for motion understanding in real-world scenarios and pushing towards "physical intelligence" through the exploration of large-scale visual data [37].
港中文最新!ReAL-AD:迈向类人推理的端到端自动驾驶,轨迹性能提升30%(ICCV'25)
自动驾驶之心· 2025-07-20 08:36
Core Insights - The article discusses the introduction of ReAL-AD, a reasoning-enhanced learning framework for end-to-end autonomous driving, which aims to align decision-making processes with human cognitive models [2][8][40]. Group 1: Framework Overview - ReAL-AD integrates a three-layer human cognitive model (driving strategy, driving decision, and driving operation) into the decision-making process of autonomous driving [2][8]. - The framework includes three main components: 1. Strategic Reasoning Injector, which formulates high-level driving strategies from complex traffic insights generated by visual-language models (VLMs) [8][20]. 2. Tactical Reasoning Integrator, which refines driving intentions into interpretable driving choices [8][20]. 3. Hierarchical Trajectory Decoder, which translates driving decisions into precise control actions for smooth and human-like trajectory execution [8][20]. Group 2: Performance Evaluation - Extensive evaluations on the NuScenes and Bench2Drive datasets demonstrate that ReAL-AD improves planning accuracy and safety by over 30% compared to baseline methods [9][34]. - The method reduces L2 error by 33% and collision rates by 32%, indicating significant enhancements in trajectory accuracy and driving safety [9][34]. Group 3: Comparison with Existing Methods - Existing end-to-end autonomous driving methods often rely on fixed and sparse trajectory supervision, which limits their ability to replicate the structured cognitive reasoning processes of human drivers [3][10]. - ReAL-AD addresses these limitations by embedding structured multi-stage reasoning into the decision-making hierarchy, enhancing generalization capabilities in diverse real-world scenarios [5][10]. Group 4: Experimental Results - The framework outperforms other state-of-the-art methods, achieving the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the NuScenes dataset [34]. - In closed-loop evaluations, the integration of ReAL-AD significantly improves driving scores and success rates, demonstrating its effectiveness in real-world applications [34].
大模型面经 - 快手快 Star
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - The article discusses the advancements and opportunities in the field of autonomous driving, emphasizing the importance of multi-modal large models and their applications in various aspects of the industry [5][6]. Group 1: Interview Insights - The interview process for positions related to multi-modal large models involves detailed discussions about candidates' research papers, particularly focusing on methodologies and results [4][5]. - Candidates are expected to demonstrate knowledge of current multi-modal large models and their paradigms, including specific models like BLIP-2 and Qwen-VL [5]. - Technical questions cover various topics such as Learnable Query, KV Cache, and the training and fine-tuning processes of large models [5][6]. Group 2: Community and Resources - The article highlights a community with nearly 4,000 members, including over 300 companies and research institutions in the autonomous driving sector, providing a platform for knowledge exchange [7]. - It mentions a comprehensive learning path covering over 30 areas of autonomous driving technology, from perception to planning and control [7]. - The community offers resources on various technical solutions and industry dynamics, aiming to support newcomers in the field of autonomous driving [7].
VLA的Action到底是个啥?谈谈Diffusion:从图像生成到端到端轨迹规划~
自动驾驶之心· 2025-07-19 10:19
Core Viewpoint - The article discusses the principles and applications of diffusion models in the context of autonomous driving, highlighting their advantages over generative adversarial networks (GANs) and detailing specific use cases in the industry. Group 1: Diffusion Model Principles - Diffusion models are generative models that focus on denoising, learning and simulating data distributions through a forward diffusion process and a reverse generation process [2][4]. - The forward diffusion process adds noise to the initial data distribution, while the reverse generation process aims to remove noise to recover the original data [5][6]. - The models typically utilize a Markov chain to describe the state transitions during the noise addition and removal processes [8]. Group 2: Comparison with Generative Adversarial Networks - Both diffusion models and GANs involve noise addition and removal processes, but they differ in their core mechanisms: diffusion models rely on probabilistic modeling, while GANs use adversarial training between a generator and a discriminator [20][27]. - Diffusion models are generally more stable during training and produce higher quality samples, especially at high resolutions, compared to GANs, which can suffer from mode collapse and require training multiple networks [27][28]. Group 3: Applications in Autonomous Driving - Diffusion models are applied in various areas of autonomous driving, including synthetic data generation, scene prediction, perception enhancement, and path planning [29]. - They can generate realistic driving scene data to address the challenges of data scarcity and high annotation costs, particularly for rare scenarios like extreme weather [30][31]. - In scene prediction, diffusion models can forecast dynamic changes in driving environments and generate potential behaviors of traffic participants [33]. - For perception tasks, diffusion models enhance data quality by denoising bird's-eye view (BEV) images and improving sensor data consistency [34][35]. - In path planning, diffusion models support multimodal path generation, enhancing safety and adaptability in complex driving conditions [36]. Group 4: Notable Industry Implementations - Companies like Haomo Technology and Horizon Robotics are developing advanced algorithms based on diffusion models for real-world applications, achieving state-of-the-art performance in various driving scenarios [47][48]. - The integration of diffusion models with large language models (LLMs) and other technologies is expected to drive further innovations in the autonomous driving sector [46].
盘点 | 浙江大学高飞团队2025上半年无人机硬核成果
自动驾驶之心· 2025-07-19 10:19
以下文章来源于深蓝AI ,作者深蓝学院 深蓝AI . 专注于人工智能、机器人与自动驾驶的学习平台。 作者 | 深蓝学院 来源 | 深蓝AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 FIRI: Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space 作为机器人与无人机领域的杰出学者,高飞老师始终站在研究的最前沿。 高飞:国家优青。浙江大学控制学院长聘副教授,研究员,博士生导师。主要研究方向:空中机器人、轨迹规划、自主导航、集群协同、定位感知。近年来,以第一作者/通讯作者身份 在知名机器人期刊、会议发表论文70余篇;获IEEE TRO 2020年最佳论文荣誉提名奖、国际基础科学大会ICBS 2024前沿科学奖、IEEE ICRA 2024年最佳论文提名等学术荣誉;入选爱思 唯尔数据库2023/24年度全球前2%顶尖科学家。 实 ...