Workflow
自动驾驶之心
icon
Search documents
自动驾驶现在关注哪些技术方向?应该如何入门?
自动驾驶之心· 2025-08-14 23:33
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving, aiming to bridge communication between enterprises and academic institutions, while providing resources and support for individuals interested in the field [1][12]. Group 1: Community and Resources - The community has organized over 40 technical routes, offering resources for both beginners and advanced researchers in autonomous driving [1][13]. - Members include individuals from renowned universities and leading companies in the autonomous driving sector, fostering a collaborative environment for knowledge sharing [13][21]. - The community provides a complete entry-level technical stack and roadmap for newcomers, as well as valuable industry frameworks and project proposals for those already engaged in research [7][9]. Group 2: Learning and Development - The community offers a variety of learning routes, including perception, simulation, and planning control, to facilitate quick onboarding for newcomers and further development for those already familiar with the field [13][31]. - There are numerous open-source projects and datasets available, covering areas such as 3D object detection, BEV perception, and world models, which are essential for practical applications in autonomous driving [27][29][35]. Group 3: Job Opportunities and Networking - The community actively shares job postings and career opportunities, helping members connect with potential employers in the autonomous driving industry [11][18]. - Members can engage in discussions about career choices and research directions, receiving guidance from experienced professionals in the field [77][80]. Group 4: Technical Discussions and Innovations - The community hosts discussions on cutting-edge topics such as end-to-end driving, multi-modal models, and the integration of various technologies in autonomous systems [20][39][42]. - Regular live sessions with industry leaders are conducted, allowing members to gain insights into the latest advancements and practical applications in autonomous driving [76][80].
万字解析DeepSeek MOE架构!
自动驾驶之心· 2025-08-14 23:33
Core Viewpoint - The article provides a comprehensive overview of the Mixture of Experts (MoE) architecture, particularly focusing on the evolution and implementation of DeepSeek's MoE models (V1, V2, V3) and their optimizations in handling token distribution and load balancing in AI models [2][21][36]. Group 1: MoE Architecture Overview - MoE, or Mixture of Experts, is a model architecture that utilizes multiple expert networks to enhance performance, particularly in sparse settings suitable for cloud computing [2][3]. - The initial interest in MoE architecture surged with the release of Mistral.AI's Mixtral model, which highlighted the potential of sparse architectures in AI [2][3]. - The Switch Transformer model introduced a routing mechanism that allows tokens to select the top-K experts, optimizing the processing of diverse knowledge [6][10]. Group 2: DeepSeek V1 Innovations - DeepSeek V1 addresses two main issues in existing MoE practices: knowledge mixing and redundancy, which hinder expert specialization [22][24]. - The model introduces fine-grained expert division and shared experts to enhance specialization and reduce redundancy, allowing for more efficient knowledge capture [25][26]. - The architecture includes a load balancing mechanism to ensure even distribution of tokens across experts, mitigating training inefficiencies [32]. Group 3: DeepSeek V2 Enhancements - DeepSeek V2 builds on V1's design, implementing three key optimizations focused on load balancing [36]. - The model limits the number of devices used for routing experts to reduce communication overhead, enhancing efficiency during training and inference [37]. - A new communication load balancing loss function is introduced to ensure equitable token distribution across devices, further optimizing performance [38]. Group 4: DeepSeek V3 Developments - DeepSeek V3 introduces changes in the MOE layer computation, replacing the softmax function with a sigmoid function to improve computational efficiency [44]. - The model eliminates auxiliary load balancing losses, instead using a learnable bias term to control routing, which enhances load balancing during training [46]. - A sequence-level auxiliary loss is added to prevent extreme imbalances within individual sequences, ensuring a more stable training process [49].
GRPO并非最优解?EvaDrive:全新RL算法APO,类人端到端更进一步(新加坡国立)
自动驾驶之心· 2025-08-14 23:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享 新加坡国立、清华和小米等团队最新的工作 - EvaDrive ! 全新强化学习算法APO,开闭环新SOTA。如 果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Siwen Jiao等 编辑 | 自动驾驶之心 最近很多端到端方向的工作!今天自动驾驶之心为大家分享新加坡国立、清华和小米等团队最新的工作 - EvaDrive。这篇工作认为: 为了解决这些问题,EvaDrive应运而生 - 一个全新的多目标强化学习框架,通过对抗性优化在轨迹生成和评测之间建立真正的闭环协同进化。EvaDrive将轨迹规划 表述为多轮对抗游戏。在这个游戏中,分层生成器通过结合自回归意图建模以捕捉时间因果关系和基于扩散的优化以提供空间灵活性,持续提出候选路径。然 后,一个可训练的多目标critic对这些proposal进行严格评测,明确保留多样化的偏好结构,而不将其压缩 ...
北大最新ReconDreamer-RL:基于扩散场景重建的强化学习框架,碰撞率降低5倍!
自动驾驶之心· 2025-08-14 11:12
Core Insights - The article discusses the challenges and advancements in end-to-end autonomous driving models, particularly focusing on closed-loop simulation reinforcement learning, which enhances robustness and adaptability through interaction with diverse environments [1] Group 1: Research Background and Core Challenges - Closed-loop reinforcement learning is gaining attention as it allows models to interact with environments, improving robustness and adaptability compared to imitation learning [1] - Two main challenges are identified: insufficient realism in simulation environments and uneven training data distribution, which limits model generalization [5][6] Group 2: Core Framework: ReconDreamer-RL - The ReconDreamer-RL framework integrates video diffusion priors and scene reconstruction, consisting of three core components that optimize autonomous driving strategies in two phases: imitation learning and reinforcement learning [3] Group 3: Components of ReconDreamer-RL - **ReconSimulator**: A high-fidelity simulation environment that combines appearance modeling and physics modeling to reduce the sim2real gap. It utilizes 3D Gaussian splatting for scene reconstruction and DriveRestorer for video artifact correction [4][7] - **Dynamic Adversary Agent (DAA)**: Generates extreme scenarios by controlling surrounding vehicle trajectories to create complex interactions like sudden lane changes and hard braking [8] - **Cousin Trajectory Generator (CTG)**: Enhances trajectory diversity by generating varied trajectories through trajectory extension and interpolation, addressing the bias towards simple linear movements in training data [10][12] Group 4: Experimental Validation: Performance and Advantages - The framework significantly reduces collision rates, achieving a collision rate of 0.077 compared to 0.386 for imitation learning methods and 0.238 for reinforcement learning methods, marking a reduction of approximately 5 times [16] - In extreme scenarios, the framework's collision rate drops to 0.053, showcasing a 404.5% improvement over traditional methods [18] - Ablation studies confirm the effectiveness of each component, with the removal of ReconSimulator leading to a collision rate increase from 0.077 to 0.238, highlighting the necessity of realistic simulation environments [20][22] Group 5: Rendering Efficiency - The rendering speed of ReconSimulator reaches 125 FPS, significantly surpassing other methods like EmerNeRF, which operates at 0.21 FPS, thus meeting the real-time interaction requirements for reinforcement learning [21]
自动驾驶VLA论文指导班第二期来啦,名额有限...
自动驾驶之心· 2025-08-14 06:49
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A second session of the VLA research paper guidance program is being launched, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6][31] - The program includes a structured curriculum over 12 weeks of online group research, followed by 2 weeks of paper guidance and a 10-week maintenance period for paper development [14][31] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants, focusing on those pursuing master's or doctoral degrees in VLA and autonomous driving, as well as professionals in the AI field seeking to enhance their algorithmic knowledge [12][13] - Participants are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch [19][20] Group 5: Course Outcomes - Participants will gain insights into classic and cutting-edge papers, coding implementations, and methodologies for selecting research topics, conducting experiments, and writing papers [14][31] - The program aims to produce a draft of a research paper, enhancing participants' academic profiles for further studies or employment opportunities [14][31]
蔚来招聘大模型-端到端算法工程师!
自动驾驶之心· 2025-08-14 03:36
Core Viewpoint - The article emphasizes the importance of job opportunities and resources in the fields of autonomous driving and embodied intelligence, highlighting a community platform for job seekers in these sectors. Group 1: Job Descriptions and Requirements - The position involves designing and developing end-to-end algorithms for intelligent assisted driving, including BEV perception, Lidar perception, occupancy networks, and multi-modal large models [1] - Candidates with experience in deep learning, object detection, and reinforcement learning algorithms are preferred, along with a background in computer science or electronics [2] - Proficiency in the PyTorch deep learning framework and good communication skills are essential for applicants [2] Group 2: Community and Resources - The AutoRobo knowledge community has nearly 1,000 members, including professionals from various companies in the autonomous driving and robotics sectors [4] - The community provides resources such as interview questions, industry reports, salary negotiation tips, and internal job postings for various positions [5][6] - A compilation of 100 interview questions related to autonomous driving and embodied intelligence is available for members [9] Group 3: Industry Reports and Insights - The community offers in-depth industry reports to help members understand the current state and future prospects of the autonomous driving and embodied intelligence sectors [15] - Reports cover various topics, including the development trends and market opportunities within the embodied intelligence industry [15] Group 4: Interview Experiences and Tips - The community shares both successful and unsuccessful interview experiences to help members learn from past mistakes and improve their interview skills [17] - Insights on salary negotiation and common HR questions are also provided to assist job seekers [19][21]
手持3D扫描仪!超高性价比可在线实时重建点云~
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The GeoScan S1 is presented as the most cost-effective 3D laser scanner in China, designed for various applications such as campus and indoor scene reconstruction, featuring lightweight design and user-friendly operation [1][7]. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in real-time 3D scene reconstruction using a multi-modal sensor fusion algorithm [1]. - It generates point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][27]. - The device is equipped with a built-in Ubuntu system and various sensor devices, allowing for flexible power supply and integration with other equipment [3][10]. Group 2: User Experience - The scanner is designed for ease of use, allowing users to start scanning with a single button and export results without complex setups [5]. - It features high efficiency and accuracy in mapping, enabling users to easily scan large areas while maintaining model precision [5][25]. - The device supports real-time modeling and high-quality color point cloud generation through advanced multi-sensor SLAM algorithms [25][32]. Group 3: Market Positioning - The GeoScan S1 is marketed as having the best price-performance ratio in the industry, with a starting price of 19,800 yuan for the basic version [7][56]. - The product is available in multiple versions, including a depth camera version and 3DGS online and offline versions, catering to diverse customer needs [56]. - The company emphasizes its strong background and project validation through collaborations with academic institutions, enhancing credibility in the market [7]. Group 4: Application Scenarios - The GeoScan S1 is suitable for various environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mines, demonstrating its versatility in 3D mapping [36][45]. - It supports cross-platform integration, making it compatible with drones, unmanned vehicles, and robots for automated operations [42]. Group 5: Technical Specifications - The device features a compact design with dimensions of 14.2cm x 9.5cm x 45cm and weighs 1.3kg without the battery [20]. - It operates on a power input of 13.8V - 24V with a battery capacity of 88.8Wh, providing approximately 3 to 4 hours of runtime [20]. - The GeoScan S1 supports various data export formats, including PCD, LAS, and PLV, ensuring compatibility with different software [20].
NVIDIA英伟达进入自动驾驶领域二三事
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The article discusses the evolution of the partnership between Tesla and NVIDIA in the autonomous driving sector, highlighting the challenges and innovations that have shaped their collaboration. Group 1: Tesla's Journey in Autonomous Driving - In September 2013, Tesla officially entered the autonomous driving arena, emphasizing internal development rather than relying on external technologies [5] - Initially, Tesla partnered with Mobileye due to the lack of suitable self-developed autonomous driving chips, enhancing Mobileye's technology with unique innovations like Fleet Learning [9][12] - Tensions arose between Tesla and Mobileye as Tesla sought to develop its own algorithms, leading to Mobileye's demand for Tesla to halt its internal vision efforts [12][13] Group 2: NVIDIA's Strategic Shift - In 2012, NVIDIA's CEO Jensen Huang recognized the potential of autonomous driving in electric vehicles, leading to a focus on deep learning and computer vision [15] - By November 2013, Huang highlighted the importance of digital computing in modern vehicles, indicating a shift towards automation in the automotive industry [17] - In January 2015, NVIDIA launched the DRIVE brand, introducing the DRIVE PX platform, which provided significant computational power for autonomous driving applications [18] Group 3: The Partnership Development - Following a significant accident in May 2016, Mobileye ended its partnership with Tesla, prompting Tesla to choose NVIDIA as its new technology partner [19][20] - In October 2016, Tesla announced that all its production models would feature hardware capable of full self-driving capabilities, utilizing NVIDIA's DRIVE PX 2 platform [20] - By early 2017, Tesla publicly announced its plans to develop its own chips, indicating a shift in its strategy while NVIDIA continued to expand its automotive partnerships [25][26] Group 4: Technological Advancements - In 2018, NVIDIA introduced the DRIVE Xavier platform, which improved computational performance while reducing power consumption [28] - Tesla's HW3, launched in April 2019, was described by Musk as the most advanced computer designed specifically for autonomous driving, marking the end of NVIDIA's direct involvement in Tesla's autonomous driving hardware [30][32]
正式开课!端到端与VLA自动驾驶小班课,优惠今日截止~
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The article emphasizes the significance of VLA (Vision-Language Alignment) as a new milestone in the mass production of autonomous driving technology, highlighting the progressive development from E2E (End-to-End) to VLA, and the growing interest from professionals in transitioning to this field [1][11]. Course Overview - The course titled "End-to-End and VLA Autonomous Driving Small Class" aims to provide in-depth knowledge of E2E and VLA algorithms, addressing the challenges faced by individuals looking to transition into this area [1][12]. - The curriculum is designed to cover various aspects of autonomous driving technology, including foundational knowledge, advanced models, and practical applications [5][15]. Course Structure - **Chapter 1**: Introduction to End-to-End Algorithms, covering the historical development and the transition from modular to end-to-end approaches, including the advantages and challenges of each paradigm [17]. - **Chapter 2**: Background knowledge on E2E technology stacks, focusing on key areas such as VLA, diffusion models, and reinforcement learning, which are crucial for future job interviews [18]. - **Chapter 3**: Exploration of two-stage end-to-end methods, discussing notable algorithms and their advantages compared to one-stage methods [18]. - **Chapter 4**: In-depth analysis of one-stage end-to-end methods, including various subfields like perception-based and world model-based approaches, culminating in the latest VLA techniques [19]. - **Chapter 5**: Practical assignment focusing on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing hands-on experience with pre-training and reinforcement learning modules [21]. Target Audience and Learning Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, such as transformer models and reinforcement learning [28]. - Upon completion, participants are expected to achieve a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering various methodologies and being able to apply learned concepts to real-world projects [28].
全面超越DiffusionDrive!中科大GMF-Drive:全球首个Mamba端到端SOTA方案
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The article discusses the GMF-Drive framework developed by the University of Science and Technology of China, which addresses the limitations of existing multi-modal fusion architectures in end-to-end autonomous driving by integrating gated Mamba fusion with spatial-aware BEV representation [2][7]. Summary by Sections End-to-End Autonomous Driving - End-to-end autonomous driving has gained recognition as a viable solution, directly mapping raw sensor inputs to driving actions, thus minimizing reliance on intermediate representations and information loss [2]. - Recent models like DiffusionDrive and GoalFlow have demonstrated strong capabilities in generating diverse and high-quality driving trajectories [2][8]. Multi-Modal Fusion Challenges - A key bottleneck in current systems is the multi-modal fusion architecture, which struggles to effectively integrate heterogeneous inputs from different sensors [3]. - Existing methods, primarily based on the TransFuser style, often result in limited performance improvements, indicating a simplistic feature concatenation rather than structured information integration [5]. GMF-Drive Framework - GMF-Drive consists of three modules: a data preprocessing module that enhances geometric information, a perception module utilizing a spatial-aware state space model (SSM), and a trajectory planning module employing a truncated diffusion strategy [7][13]. - The framework aims to retain critical 3D geometric features while improving computational efficiency compared to traditional transformer-based methods [11][16]. Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM dataset, outperforming the previous best model, DiffusionDrive, by 0.8 points [32]. - The framework demonstrated significant improvements in key metrics, including a 1.1 point increase in the driving area compliance score (DAC) and a maximum score of 83.3 in the ego vehicle progression (EP) [32][34]. Component Analysis - The study conducted ablation experiments to assess the contributions of various components, confirming that the integration of geometric representations and the GM-Fusion architecture is crucial for optimal performance [39][40]. - The GM-Fusion module, which includes gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, significantly enhances the model's ability to process multi-modal data effectively [22][44]. Conclusion - GMF-Drive represents a novel end-to-end autonomous driving framework that effectively combines geometric-enhanced pillar representation with a spatial-aware fusion model, achieving superior performance compared to existing transformer-based architectures [51].