Workflow
自动驾驶之心
icon
Search documents
工业界大佬带队!彻底搞懂自动驾驶世界模型...
自动驾驶之心· 2025-12-11 03:35
Core Viewpoint - The article introduces a new course titled "World Models and Autonomous Driving Small Class," focusing on advanced algorithms in the field of autonomous driving, including general world models, video generation, and OCC generation [1][3]. Course Overview - The course is developed in collaboration with industry leaders and follows the success of a previous course on end-to-end and VLA autonomous driving [1]. - The course aims to enhance understanding and practical skills in world models, targeting individuals interested in the autonomous driving industry [11]. Course Structure - **Chapter 1: Introduction to World Models** - Discusses the relationship between world models and end-to-end autonomous driving, including historical development and current applications [6]. - Covers various types of world models, such as pure simulation, simulation + planning, and generation of sensor inputs and perception results [6]. - **Chapter 2: Background Knowledge of World Models** - Focuses on foundational knowledge, including scene representation, Transformer, and BEV perception [6][12]. - Highlights key technical terms frequently encountered in job interviews related to world models [7]. - **Chapter 3: General World Model Exploration** - Examines popular models like Marble from Li Fei-Fei's team, DeepMind's Genie 3, and Meta's JEPA, along with recent discussions on VLA + world model algorithms [7]. - **Chapter 4: Video Generation-Based World Models** - Concentrates on video generation algorithms, starting with Wayve's GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM [8]. - **Chapter 5: OCC-Based World Models** - Focuses on OCC generation methods, discussing three major papers and a practical project that extends to vehicle trajectory planning [9]. - **Chapter 6: World Model Job Specialization** - Provides insights into the application of world models in the industry, addressing pain points and interview preparation for relevant positions [10]. Learning Outcomes - The course aims to equip participants with the skills to reach a level equivalent to one year of experience as a world model autonomous driving algorithm engineer [14]. - Participants will gain a comprehensive understanding of world model technologies, including video generation and OCC generation methods, and will be able to apply their knowledge in practical projects [14].
自驾世界模型剩下的论文窗口期没多久了......
自动驾驶之心· 2025-12-11 00:05
Core Insights - The article highlights the recent surge in research papers related to world models in autonomous driving, indicating a trend towards localized breakthroughs and verifiable improvements in the field [1] - It emphasizes the importance of refining submissions to top conferences, suggesting that the final 10% of polishing can significantly impact the overall quality and acceptance of the paper [2] - The platform "Autonomous Driving Heart" is presented as a leading AI technology media outlet in China, with a strong focus on autonomous driving and related interdisciplinary fields [3] Summary by Sections Research Trends - Numerous recent works in autonomous driving, such as MindDrive and SparseWorld-TC, reflect a focus on world models, which are expected to dominate upcoming conferences [1] - The article suggests that the main themes for the end of this year and the first half of next year will likely revolve around world models, indicating a strategic direction for researchers [1] Guidance and Support - The platform offers personalized guidance for students, helping them navigate the complexities of research and paper submission processes [7][13] - It claims a high success rate, with a 96% acceptance rate for students who have received guidance over the past three years [5] Faculty and Resources - The platform boasts over 300 dedicated instructors from top global universities, ensuring high-quality mentorship for students [5] - The instructors have extensive experience in publishing at top-tier conferences and journals, providing students with valuable insights and support [5] Services Offered - The article outlines various services, including personalized paper guidance, real-time interaction with mentors, and comprehensive support throughout the research process [13] - It also mentions the potential for students to receive recommendations from prestigious institutions and direct job placements in leading tech companies [19]
前蔚来智驾高管加盟新公司
自动驾驶之心· 2025-12-11 00:05
以下文章来源于蚀刻AiTech ,作者蚀刻团队 蚀刻AiTech . 智能驾驶十年老兵,走过四家公司,搞过芯片做过量产,写写行业新鲜事。 期待刻录AI发展的重点时 刻。 原文链接: 蚀刻独家 | 前蔚来智驾高管加盟新公司 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 该头部无人配送公司目前正处于发展的快车道。今年10月,公司宣布完成D轮融资,创下中国自动驾驶 领域最大一笔私募融资纪录。目前该公司已实现累计交付车辆突破1万台、业务覆盖全国近300个城市的 规模,正全力杀入即时物流和泛城配市场,加速推进无图自动驾驶技术和端到端大模型的应用。这些技 术正是该高管在蔚来负责的核心领域。他的加入,可以说是该公司为应对下一阶段挑战所做的关键性人 才储备。 加加加加加加加加柱柱柱柱柱柱柱柱哥哥哥哥哥哥哥哥微微微微微微微微信信信信信信信信爆爆爆爆爆爆爆爆料料料料料料料料!!!!!!!! 自 动 驾驶之心 某前蔚来智驾高管已正式加入某头部无人配送自动驾驶企业,并出任该公司要职。 该高管在加入蔚来前,曾任Momenta高级 ...
上交最新!端到端&VLA综述:广义范式下的统一视角
自动驾驶之心· 2025-12-11 00:05
Core Viewpoint - The article discusses the evolution of autonomous driving technology, emphasizing the need for a unified perspective on various paradigms, including end-to-end (E2E), VLM-centric, and hybrid approaches, to enhance understanding and performance in complex driving scenarios [2][4][14]. Group 1: Introduction and Background - Traditional modular approaches in autonomous driving have led to information loss and error accumulation due to task fragmentation, prompting a shift towards data-driven end-to-end architectures [5][10]. - The article introduces a comprehensive review titled "Survey of General End-to-End Autonomous Driving: A Unified Perspective," which aims to bridge the gap in understanding between different paradigms [3][4]. Group 2: Paradigms of Autonomous Driving - General End-to-End (GE2E) is defined as any model that processes raw sensor inputs into planning trajectories or control actions, regardless of whether it includes visual-language models (VLM) [4][14]. - The three main paradigms unified under GE2E are: - Traditional End-to-End (Conventional E2E), which relies on structured scene representation for precise trajectory planning [9][17]. - VLM-centric End-to-End, which utilizes pre-trained visual-language models to enhance generalization and reasoning capabilities in complex scenarios [11][33]. - Hybrid End-to-End, which combines the strengths of both traditional and VLM-centric approaches to balance high-level semantic understanding with low-level control precision [12][39]. Group 3: Performance Comparison - In open-loop performance tests, the hybrid paradigm outperformed others, demonstrating the importance of world knowledge in handling long-tail scenarios [54]. - Traditional E2E methods still dominate in numerical trajectory prediction accuracy, indicating their robustness in structured environments [54]. - In closed-loop performance, traditional methods maintain a stronghold, particularly in complex driving tasks, while VLA methods show potential but require further refinement in fine-grained trajectory control [55][56]. Group 4: Data and Learning Strategies - The evolution of datasets from geometric annotations to semantic-rich datasets is crucial for training models capable of logical reasoning and understanding complex traffic contexts [46][48]. - The introduction of Chain of Thought (CoT) annotations in datasets supports advanced reasoning tasks, moving beyond simple input-output mappings [47]. Group 5: Model Architecture and Details - The article provides a detailed comparison of mainstream model architectures, including their inputs, backbone networks, intermediate tasks, and output forms, to clarify the distinctions among different paradigms [57].
最近做 VLA 的一些心得体会
自动驾驶之心· 2025-12-11 00:05
Core Insights - The article discusses the challenges and advancements in Vision-Language Models (VLM) for autonomous driving, highlighting issues such as hallucination, 3D spatial understanding, and processing speed [3]. Group 1: Challenges in VLM - Hallucination issues manifest as generating non-existent information and failing to perceive relevant data, which can be mitigated through dynamic perception techniques [3]. - Insufficient 3D spatial understanding is attributed to pre-training tasks being predominantly 2D, suggesting the incorporation of spatial localization tasks during training [3]. - Processing speed is a concern, with potential solutions including KV Cache, visual token compression, and mixed data training to enhance model efficiency [3]. Group 2: Learning Paradigms and Model Improvements - The learning paradigm should shift from imitation learning (SFT) to preference learning (DPO, GRPO), with simultaneous multi-task training yielding better results than sequential single-task training [3]. - To prevent catastrophic forgetting in foundation models, adding pre-training data is a simple and effective method [3]. - Enhanced supervisory signals can lead to better model representations, achieved by adding auxiliary task heads to the VLM model [3]. Group 3: Interaction and Evaluation - Current VLMs exhibit insufficient interaction between vision and language, limiting their effectiveness as base models; improving this interaction is crucial [3]. - The output method for trajectories is flexible, with various approaches yielding satisfactory results, though diffusion heads are preferred in industry for speed [3]. - Evaluation remains challenging due to inconsistencies between training and testing conditions, necessitating better alignment of objectives and data distributions [3].
Waymo刚刚的基座模型分享:快慢双系统端到端 & 世界模型仿真
自动驾驶之心· 2025-12-10 01:28
Core Insights - Waymo is advancing its autonomous driving technology by prioritizing "verifiable safe AI" as a core principle, significantly reducing the accident rate compared to human drivers by over ten times [2][5][19] - The company has achieved over 100 million miles of fully autonomous driving, continuously improving road safety in its operational areas [2][5] Group 1: Waymo's AI Strategy - Waymo's AI ecosystem integrates a driver, a simulator, and an evaluator, all powered by the Waymo Foundation Model, ensuring safety is a foundational element rather than an afterthought [5][12] - The Waymo Foundation Model serves as a multifunctional "world model," providing a robust interface for interaction among various components and supporting end-to-end signal backpropagation during training [8][10] Group 2: Components of the AI Ecosystem - The driver model generates safe and compliant action sequences, with its capabilities distilled into more efficient student models for real-time deployment in vehicles [14] - The simulator creates high-fidelity virtual environments for testing the driver model under diverse and challenging scenarios, while the evaluator analyzes driving behavior to provide feedback for continuous improvement [14][15] Group 3: Learning and Optimization Mechanisms - Waymo employs a dual learning loop: an internal loop driven by the simulator and evaluator for reinforcement learning, and an external loop utilizing real-world driving data to enhance the driver model [17][19] - The company has amassed a vast amount of fully autonomous driving data, which is crucial for training and optimizing its systems, surpassing the reliance on human driving data [19]
最近Feed-forward GS的工作爆发了
自动驾驶之心· 2025-12-10 00:04
Core Viewpoint - The article emphasizes the rapid advancements in 3D Gaussian Splatting (3DGS) technology within the autonomous driving sector, highlighting the need for structured learning pathways for newcomers in the field [2][4]. Group 1: Technology Highlights - Tesla's introduction of 3D Gaussian Splatting at ICCV has garnered significant attention, indicating a shift towards feed-forward GS algorithms for scene reconstruction [2]. - The iterative development of 3DGS technology includes static 3D reconstruction, dynamic 4D reconstruction, and surface reconstruction, showcasing its evolving nature [4]. Group 2: Course Offering - A comprehensive course titled "3DGS Theory and Algorithm Practical Tutorial" has been designed to provide a structured learning roadmap for 3DGS, covering both theoretical foundations and practical applications [4]. - The course will be taught by an expert with extensive experience in 3D reconstruction and algorithm development, ensuring high-quality instruction [5]. Group 3: Course Structure - The course consists of six chapters, starting with foundational knowledge in computer graphics and progressing through principles, algorithms, and specific applications in autonomous driving [8][9][10][11][12]. - Each chapter is designed to build upon the previous one, culminating in discussions about current industry needs and research directions in 3DGS [11][12][13]. Group 4: Target Audience and Prerequisites - The course is aimed at individuals with a background in computer graphics, visual reconstruction, and programming, particularly those interested in pursuing careers in the autonomous driving industry [17]. - Participants are expected to have a foundational understanding of relevant mathematical concepts and programming languages, which will facilitate their learning experience [17].
地平线苏箐:曾一度看不到自动驾驶太多希望...
自动驾驶之心· 2025-12-10 00:04
以下文章来源于RoboX ,作者RoboX RoboX . 从AI汽车到机器人,我们关注最具潜力的超级智能体! 作者 | RoboX 来源 | RoboX 原文链接: 地平线苏箐演讲全文提炼:自动驾驶的曙光、痛苦与轮回 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 演讲者:苏箐 | 地平线副总裁&首席架构师 演讲时间 :2025.12.9 演讲场合 :2025地平线技术生态大会 全文提炼如下: 今年,我们确实能看到自动驾驶的技术路径是比较清晰的,但也会看到有更难的问题在前面。你知道这些问题能解掉,但应该怎么解今天还不知道。 绝大多数行业外的人,可能并不理解自动驾驶团队面临的困难和压力。这种智力和体力的双重压榨极度痛苦,因为有SOP的时间压在那儿,然后又有方法论的变化, 还有各种corner case需要去解。 在稠密的世界里连续运行的时候,所有的case都需要解决,这就是这个行业非常痛苦的地方。 曙光:重大分水岭的出现 我刚准备加入地平线的时候,和余凯博士聊过几次, ...
北航一篇304页的Code Agent综述!近30家机构参与
自动驾驶之心· 2025-12-10 00:04
Core Insights - The article discusses the transformative shift in code intelligence from being an "assistive tool" to becoming an "autonomous developer" driven by advancements in large language models (LLMs) [2][8] - A comprehensive review paper by 28 institutions outlines the evolution of code models and establishes a complete technical framework for intelligent software engineering [2][8] Evolution of Code Intelligence - The evolution of code intelligence spans six distinct phases from manual coding in the 1960s to the anticipated AI autonomous era post-2025, highlighting key technological advancements at each stage [8][9] - The core driving force behind this evolution is the transition from rule-based systems to transformer-based models, enabling significant improvements in code understanding and generation capabilities [9][11] Code Foundation Models - Current mainstream models are categorized into General LLMs and Code-Specialized LLMs, each with unique advantages and technological synergies [11][12] - Code-specialized models have emerged through focused data, architectural innovations, and task-specific fine-tuning, surpassing general models in coding tasks [15][18] Training and Evaluation - The paper outlines a comprehensive evaluation system for code tasks, categorized into statement/function/class-level tasks, repository-level tasks, and intelligent agent system tasks [18][19] - Evaluation metrics have evolved to include execution-based indicators, emphasizing the importance of not just generating code but ensuring its functionality [19][22] Alignment Techniques - Two primary alignment techniques are discussed: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), both crucial for ensuring models meet human requirements [22][28] - Various data synthesis methods for alignment tasks are highlighted, including single and multi-round SFT, as well as RL methods that leverage human and AI feedback [25][27] Software Engineering Agents (SWE Agents) - SWE Agents are described as advanced systems capable of autonomously completing complex engineering tasks across the software development lifecycle [31][32] - The paper identifies four key stages of SWE Agents' application: requirements engineering, software development, software testing, and software maintenance [31] Future Trends - The article identifies three core trends for the next 3-5 years: the shift from general to specialized models, increased autonomy of SWE Agents, and the integration of multimodal inputs for enhanced code intelligence [33][34][35] - The ultimate goal of code intelligence is to automate repetitive coding tasks, thereby allowing human developers to focus on higher-level creative tasks [37][38]
澳门大学首个世界模型驱动的视觉定位框架!
自动驾驶之心· 2025-12-10 00:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Haicheng Liao等 编辑 | 自动驾驶之心 在自动驾驶的交互场景中,最尴尬的时刻莫过于此: 乘客指着前方复杂的路口说:"跟着那辆SUV"。自动驾驶系统看着眼前三辆长得差不多的车,内心OS:"哪辆?是左边那辆?还是正在变道那辆?" 现有的自动驾驶视觉定位(Visual Grounding)模型,大多像是一个" 只会看图说话 "的愣头青。它们盯着当前的这一帧画面,试图从 像素 里找答案。一旦指令模糊, 或者目标被遮挡,它们就很容易"指鹿为马",甚至引发错误推理。 人类司机为什么不会弄错?因为我们会" 预判 "。 当我们听到指令时,大脑里会瞬间推演未来的画面:左边那辆车马上要转弯了,不符合"跟着"的语境;只有中间那辆车在加速直行,才是最可能的意图。 "在行动之前,先思考未来"。 受此启发,来自[澳门大学]的研究团队提出了全新的框架 ThinkDeeper。这是首个将世界模型(World Model)引入自动驾驶视觉定位的研究。这项工作不仅刷 ...