自动驾驶之心

Search documents
自动驾驶论文速递 | 多模态大模型、运动规划、场景理解等~
自动驾驶之心· 2025-07-13 08:10
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 MCAM:面向自车层面驾驶视频理解的多模态因果分析模型 重庆大学&国防科技大ICCV25中稿的工作,本文提出 MCAM 模型,通过 DSDAG 因果图建模自车状态动 态演化,在BDD-X数据集上将驾驶行为描述任务BLEU-4提升至 35.7%,推理任务BLEU-4提升至 9.1%,显 著优于DriveGPT4等基线模型。 主要贡献: 算法框架: 实验结果: 论文标题:MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding 论文链接:https://arxiv.org/abs/2507.06072 代码:https://github.com/SixCorePeach/MCAM 1. 提出驾驶状态有向无环图(DSDAG),用于建模动态驾驶交互和状态转换,为因果分析模块(CAM) 提供结构化理论基础。 2. 提出多模态因果分析模型(MCAM),这是首个针对 ego-vehicle 级驾驶视频理解 ...
4000人的自动驾驶黄埔军校,死磕技术分享与求职交流~
自动驾驶之心· 2025-07-12 14:43
Core Viewpoint - The smart driving industry is experiencing significant growth, with companies willing to invest heavily in research and talent acquisition, indicating a robust job market and opportunities for new entrants [2][3]. Group 1: Industry Trends - The smart driving sector continues to attract substantial funding for research and development, with companies offering competitive salaries to attract talent [2]. - There is a noticeable trend of shorter technology iteration cycles in the autonomous driving field, with a focus on advanced technologies such as visual large language models (VLA) and end-to-end systems [7][11]. Group 2: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" aims to create a comprehensive community for knowledge sharing, focusing on academic and engineering challenges in the autonomous driving industry [3][11]. - The community has established a structured learning path covering various aspects of autonomous driving technology, including perception, planning, and control [13][15]. Group 3: Educational Offerings - The community offers a range of educational resources, including video courses, hardware tutorials, and live sessions with industry experts, aimed at both newcomers and experienced professionals [3][15]. - There are dedicated modules for job preparation, including resume sharing and interview experiences, to help members navigate the job market effectively [5][12]. Group 4: Technical Focus Areas - Key technical areas of focus include visual language models, world models, and end-to-end autonomous driving systems, with ongoing discussions about their integration and application in real-world scenarios [11][36]. - The community emphasizes the importance of understanding the latest advancements in algorithms and models, such as diffusion models and generative techniques, for future developments in autonomous driving [16][36].
某智驾公司一言难尽的融资。。。
自动驾驶之心· 2025-07-12 12:00
Core Viewpoint - The article discusses a unique financing strategy employed by an autonomous driving company in collaboration with a leading automotive manufacturer, highlighting the challenges and competitive landscape of the autonomous driving industry. Group 1: Financing Strategy - An autonomous driving company has been struggling to secure funding due to its high valuation compared to its limited production projects, which are close to those of top autonomous driving firms [3][4]. - The company approached a leading automotive manufacturer for investment, which agreed to invest under the condition that the funds would be reinvested into a struggling subsidiary parts company of the manufacturer [4]. - This financing maneuver allows the automotive manufacturer to present the investment as external funding, enhancing its public relations while providing necessary capital to its subsidiary [4]. Group 2: Industry Competition - The autonomous driving market is highly competitive, with companies that excel in algorithms and production capabilities successfully securing projects and funding, while those lacking in these areas struggle to obtain both [5]. - The article emphasizes that for the autonomous driving company, focusing on improving algorithm performance and production delivery is more crucial than engaging in complex investment maneuvers with major clients [5].
VLM岗位面试,被摁在地上摩擦。。。
自动驾驶之心· 2025-07-12 12:00
Core Viewpoint - The article discusses the advancements and applications of large models in autonomous driving, particularly focusing on the integration of multi-modal large models in the industry and their potential for future development [2][4][17]. Group 1: Interview Insights - The interview process for a position at Li Auto involved extensive discussions on large models, including their foundational concepts and practical applications in autonomous driving [2][4]. - The interviewer emphasized the importance of private dataset construction and data collection methods, highlighting that data remains the core of business models [4][6]. Group 2: Course Overview - A course on multi-modal large models is introduced, covering topics from general multi-modal models to fine-tuning techniques, ultimately focusing on end-to-end autonomous driving applications [5][9][11]. - The course structure includes chapters on the introduction to multi-modal large models, foundational modules, general models, fine-tuning techniques, and specific applications in autonomous driving [9][11][17]. Group 3: Technical Focus - The article outlines the technical aspects of multi-modal large models, including architecture, training paradigms, and the significance of fine-tuning techniques such as Adapter and LoRA [11][15]. - It highlights the application of these models in autonomous driving, referencing algorithms like DriveVLM, which is pivotal for Li Auto's end-to-end driving solutions [17][19]. Group 4: Career Development - The course also addresses career opportunities in the field, discussing potential employers, job directions, and the skills required for success in the industry [19][26]. - It emphasizes the importance of having a solid foundation in deep learning and model deployment, along with practical coding skills [27].
资料汇总 | VLM-世界模型-端到端
自动驾驶之心· 2025-07-12 12:00
Core Insights - The article discusses the advancements and applications of visual language models (VLMs) and large language models (LLMs) in the field of autonomous driving and intelligent transportation systems [1][2]. Summary by Sections Overview of Visual Language Models - Visual language models are becoming increasingly important in the context of autonomous driving, enabling better understanding and interaction between visual data and language [4][10]. Recent Research and Developments - Several recent papers presented at conferences like CVPR and NeurIPS focus on improving the performance of VLMs through various techniques such as behavior alignment, efficient pre-training, and enhancing compositionality [5][7][10]. Applications in Autonomous Driving - The integration of LLMs and VLMs is expected to enhance various tasks in autonomous driving, including object detection, scene understanding, and planning [10][13]. World Models in Autonomous Driving - World models are being developed to improve the representation and prediction of driving scenarios, with innovations like DrivingGPT and DriveDreamer enhancing scene understanding and video generation capabilities [10][13]. Knowledge Distillation and Transfer Learning - Techniques such as knowledge distillation and transfer learning are being explored to optimize the performance of vision-language models in multi-task settings [8][9]. Community and Collaboration - A growing community of researchers and companies is focusing on the development of autonomous driving technologies, with numerous resources and collaborative platforms available for knowledge sharing and innovation [17][19].
研一刚入学导师让我搭各种AI Agent框架,应该往什么方向努力?
自动驾驶之心· 2025-07-12 12:00
Core Viewpoint - The article discusses the current state and future directions of LLM (Large Language Model) Agents, emphasizing the need for multi-modal integration and the challenges faced in various application areas, particularly in gaming and simulation [1][14]. Group 1: Types of LLM Agents - The first type is referred to as game-theoretic or MALLM agents, primarily derived from MARL (Multi-Agent Reinforcement Learning) methods, focusing on matrix games and environments like Overcooked [2]. - The second type is game-oriented agents, which can be further divided into text-based environments and traditional games like chess and poker, highlighting the importance of understanding game mechanics [4][5]. - The third type involves embodied intelligence, particularly in robotics, which requires more substantial real-world applications rather than pure simulations [5]. Group 2: Challenges in Development - Key challenges include the creation of effective simulators, ensuring personalized and intelligent responses from models, and managing interactions among potentially millions of agents [8]. - The lack of front-end rendering in some projects is noted as a disadvantage, as compelling demos are crucial for attracting attention and investment [9]. - The article emphasizes that the most commercially viable agents are those used in customer service and retrieval-augmented generation (RAG) applications, which are currently in high demand [9]. Group 3: Specific Applications - Minecraft is highlighted as a competitive area with three main approaches: pure reinforcement learning, pure LLM, and a combination of both, with a caution against entering this saturated market without significant confidence [11][12][13]. - The article concludes that the initial opportunities in the agent field have largely been exhausted, and future endeavors must be strategically planned to leverage existing strengths and commercial support [14].
地平线、滴滴出行2026届校园招聘正式开启!
自动驾驶之心· 2025-07-12 06:51
Core Viewpoint - The self-driving industry is experiencing a surge in recruitment for the 2026 graduate cohort, with numerous companies like Horizon Robotics, Didi, and Yuanrong Qixing opening positions, indicating a robust demand for roles related to perception, control, end-to-end systems, and large models [1][2]. Group 1: Recruitment Trends - Many companies are increasing positions related to embodied intelligence, reflecting a trend of integration between self-driving technology and embodied concepts [1]. - Positions available include hardware development engineers, perception post-processing engineers, middleware software engineers, planning control algorithm engineers, and more, with multiple openings across various cities [2]. - The recruitment process is expected to ramp up with technical and HR interviews scheduled for late July and early August [1]. Group 2: Community and Resources - The AutoRobo Knowledge Circle serves as a community for job seekers in the fields of self-driving, embodied intelligence, and robotics, with nearly 1,000 members from various companies [6]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and resume optimization services [6][7]. - Members can access a wealth of information including successful and unsuccessful interview experiences, which can help refine their job application strategies [18][19]. Group 3: Job Opportunities - The 2026 internship positions include roles such as C++ development intern and PyTorch framework development intern, indicating a focus on software development skills [8]. - The community shares job openings in algorithms, development, and product roles, ensuring members are informed about the latest opportunities [7]. Group 4: Industry Insights - The community also compiles industry reports to help members understand the current state and future prospects of the self-driving and embodied intelligence sectors [16]. - Topics covered in the reports include market opportunities, technological trends, and the development of humanoid robots [16].
都在抢端到端的人才,却忽略了最基本的能力。。。
自动驾驶之心· 2025-07-12 06:36
Core Viewpoint - The article emphasizes the importance of high-quality 4D data automatic annotation in the development of autonomous driving systems, highlighting that model algorithms are crucial for initial development but not sufficient for advanced capabilities [3][4]. Group 1: Industry Trends - A new player in the autonomous driving sector has rapidly advanced its intelligent driving capabilities, surpassing competitors like Xiaopeng within six months, leading to a talent war for engineers in the industry [2]. - The industry consensus indicates that the future of intelligent driving relies on vast amounts of automatically annotated data, marking a shift towards high-quality 4D data annotation as a critical component for mass production [3][4]. Group 2: Challenges in Data Annotation - The main challenges in 4D automatic annotation include high requirements for spatiotemporal consistency, complex multi-modal data fusion, difficulties in generalizing dynamic scenes, and the contradiction between annotation efficiency and cost [8][9]. - The automation of dynamic object annotation involves several steps, including offline 3D detection, tracking, post-processing optimization, and sensor occlusion optimization [5][6]. Group 3: Educational Initiatives - The article introduces a course aimed at addressing the challenges of entering the field of 4D automatic annotation, covering the entire process and core algorithms, and providing practical exercises [9][24]. - The course is designed for various audiences, including researchers, students, and professionals looking to transition into the data closure field, requiring a foundational understanding of deep learning and autonomous driving perception algorithms [25].
4000人的自动驾驶黄埔军校,死磕技术分享与求职交流~
自动驾驶之心· 2025-07-12 05:41
Core Insights - The autonomous driving industry is experiencing significant changes, with many professionals transitioning to related fields like embodied intelligence, while others remain committed to the sector due to strong funding and high salaries for new graduates [2][6] - The article emphasizes the importance of networking and community engagement for knowledge acquisition and job preparation in the autonomous driving field [3][4] Group 1: Industry Trends - The autonomous driving sector continues to attract substantial investment, with companies willing to offer competitive salaries to attract talent [2] - The technology iteration cycle in autonomous driving is becoming shorter, indicating rapid advancements and a focus on cutting-edge technologies such as visual large language models (VLM) and end-to-end systems [8][12] Group 2: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" is highlighted as a leading community for professionals and students in the autonomous driving field, offering resources such as video courses, technical discussions, and job opportunities [4][14] - The community provides a structured learning path covering various aspects of autonomous driving technology, including perception, planning, and machine learning [19][21] Group 3: Technical Focus Areas - Key technical areas identified for 2025 include VLM, end-to-end systems, and world models, which are crucial for the future evolution of autonomous driving technology [8][43] - The community emphasizes the integration of advanced algorithms and models, such as diffusion models and 3D generative simulations, to enhance autonomous driving capabilities [15][22]
之心急聘!25年业务合伙人招聘,量大管饱~
自动驾驶之心· 2025-07-12 05:41
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, research guidance, and hardware development [2][5] - The main areas of expertise sought include large models, multimodal models, diffusion models, SLAM, 3D object detection, and closed-loop simulation [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing for job seeking, PhD recommendations, and overseas study opportunities, along with substantial cash incentives [5] - There are opportunities for collaboration on entrepreneurial projects [5] - Interested parties are encouraged to contact via WeChat for further inquiries [6]