Workflow
自动驾驶之心
icon
Search documents
Waymo自动驾驶最新探索:世界模型、长尾问题、最重要的东西
自动驾驶之心· 2025-10-10 23:32
Core Insights - Waymo has developed a large-scale AI model called the Waymo Foundation Model, which supports vehicle perception, behavior prediction, scene simulation, and driving decision-making [5][11] - The model integrates data from multiple sensors to understand the environment, similar to how large language models operate [5][11] - The focus on data quality and selection is crucial for ensuring that the model addresses the right problems effectively [25][30] Group 1: World Model Development - Waymo's world model encodes all sensor data and incorporates world knowledge, enabling it to decode driving-related tasks [11] - The model allows for real-time perception and decision-making on the vehicle while simulating real driving environments in the cloud for testing [7][11] - The long-tail problem in autonomous driving, which includes complex scenarios like adverse weather and construction, remains a significant challenge [11][12] Group 2: Addressing Long-Tail Problems - Weather conditions such as rain and snow present unique challenges for autonomous driving, requiring high precision in judgment [12][14] - Low visibility scenarios necessitate the use of multi-modal sensors to detect objects effectively [15] - Occlusion reasoning is critical for understanding hidden objects and ensuring driving safety [18][21] Group 3: Complex Scene Understanding - Understanding complex scenes like construction zones and dynamic environments requires advanced reasoning capabilities [24] - Real-time responses to dynamic signals, such as traffic officer gestures, are essential for safe navigation [24] - The use of large language models is being explored to enhance scene understanding and decision-making [24] Group 4: Importance of Data, Algorithms, and Computing Power - The three critical components for successful autonomous driving are data, algorithms, and computing power, with a strong emphasis on data quality [25][30] - Efficient data mining from vast video datasets is vital for understanding driving events [30] - Quick decision-making is essential for safety and smooth operation, with a focus on reducing response times across the algorithmic chain [30][31] Group 5: Operational Infrastructure - Waymo's operational facilities, including depots and modification workshops, are crucial for the efficient deployment of Level 4 autonomous vehicles [33] - Vehicles can autonomously navigate to charging stations and begin operations after sensor installation [33] - The engineering challenges of scaling autonomous driving technology require collaboration with traditional automotive engineers [34] Group 6: Sensor and Algorithm Response - The responsiveness of sensors, such as camera frame rates, is critical for effective autonomous driving [36] - Algorithms must process data at high frequencies to ensure timely execution of driving commands [36] - The evolution of vehicle control systems is moving towards higher frequency responses, particularly in electric and electronically controlled systems [36]
某新势力智驾一号位的离职始末......
自动驾驶之心· 2025-10-10 23:32
Core Insights - The recent OTA incident in a new force's autonomous driving system has catalyzed the departure of its top autonomous driving executive, highlighting significant issues in user satisfaction and brand reputation [5][6] - The internal dynamics of the company have shifted, with a new leader in the autonomous driving sector being appointed, indicating a need for urgent change to regain competitive advantage [6][7] Summary by Sections Incident Overview - The latest OTA update was met with strong user dissatisfaction due to numerous bugs, negatively impacting the company's reputation [5] - A previous OTA incident last year led to the dismissal of the technical development head and a reorganization of the testing department, raising questions about accountability for the recent failure [5] Internal Dynamics - The autonomous driving executive was already in a precarious position, overshadowed by a newly appointed head of world modeling, who has taken control of key algorithm developments [6] - The absence of the autonomous driving executive from a recent high-level meeting signified a decisive leadership change, reflecting the company's urgent need for transformation in the competitive landscape [6] Competitive Landscape - The company faces intensified competition not only from Huawei but also from leading autonomous driving firms like Momenta, Yuanrong, and Horizon, which have demonstrated strong performance in advanced algorithms [7] - Historically, the company and Huawei were leaders in algorithm development, but now they are at risk of being outperformed by these emerging competitors, which could have disastrous consequences for their market position [7]
传统的感知被嫌弃,VLA逐渐成为新秀...
自动驾驶之心· 2025-10-10 23:32
Core Insights - The focus of academia and industry is shifting towards VLA (Vision-Language-Action) for enhancing autonomous driving capabilities, providing human-like reasoning in vehicle decision-making processes [1][4] - Traditional methods in perception and lane detection are becoming mature, leading to a decline in interest, while VLA is seen as a critical area for development by major players in the autonomous driving sector [4][6] - A comprehensive learning roadmap for VLA has been designed, covering foundational principles to practical applications [6] Summary by Sections Course Overview - The course titled "Autonomous Driving VLA and Large Model Practical Course" aims to deepen understanding of VLA through detailed explanations of cutting-edge algorithms and practical assignments [6][22] Chapter 1: Introduction to VLA Algorithms - This chapter provides a conceptual overview of VLA algorithms, their historical development, and introduces open-source benchmarks and evaluation metrics relevant to VLA [13] Chapter 2: Algorithm Fundamentals of VLA - Focuses on foundational knowledge in Vision, Language, and Action modules, and includes a section on deploying and using popular open-source large models [14] Chapter 3: VLM as an Autonomous Driving Interpreter - Discusses the role of VLM (Vision-Language Model) in scene understanding prior to the introduction of VLA, covering classic and recent algorithms such as DriveGPT4 and TS-VLM [15] Chapter 4: Modular and Integrated VLA - Explores the evolution of language models from passive descriptions to active planning components, detailing modular and integrated VLA approaches, and includes practical coding exercises [16] Chapter 5: Reasoning-Enhanced VLA - Concentrates on the reasoning-enhanced VLA subfield, introducing new reasoning modules and discussing various algorithms and their applications in autonomous driving [17][19] Chapter 6: Major Project - The final chapter emphasizes hands-on practice, guiding participants through network construction, dataset customization, and model training using the ms-swift framework [20] Learning Requirements and Outcomes - Participants are expected to have a foundational understanding of autonomous driving, large models, and relevant mathematical concepts, with the course designed to equip them with the ability to understand and apply VLA algorithms in practical scenarios [24]
研二多发几篇论文,也不至于现在卡的这么难受......
自动驾驶之心· 2025-10-10 04:00
Core Insights - The article discusses the challenges faced by graduate students in publishing academic papers and offers a professional tutoring service to assist them in overcoming these obstacles [1][2][5]. Group 1: Challenges Faced by Graduate Students - Many graduate students struggle with topics, frameworks, and arguments during the paper writing process, especially when left without guidance from their advisors [2][8]. - The importance of high-quality research papers is emphasized for both academic and career advancement, particularly for master's and doctoral students [1][5]. Group 2: Tutoring Service Overview - The tutoring service, "Automatic Driving Heart," provides personalized guidance for students, helping them through the entire research and publication process [5][10]. - The service includes a structured timeline for completing a paper, from selecting research topics to submitting for publication [4][10]. Group 3: Target Audience and Benefits - The service is aimed at students in computer science and related fields who are facing challenges such as lack of guidance, need for research experience, or desire to enhance their academic credentials [9][10]. - Successful participants may receive recommendations from prestigious institutions and direct job referrals to leading tech companies [17].
Diffusion²:一个双扩散模型,破解自动驾驶“鬼探头”难题!
自动驾驶之心· 2025-10-09 23:32
Core Insights - The article discusses the development of a novel framework called Diffusion², designed specifically for momentary trajectory prediction in autonomous driving scenarios, addressing the challenge of pedestrian trajectory prediction when limited observational data is available [1][52]. Background and Contributions - Accurate pedestrian trajectory prediction is crucial for enhancing vehicle safety, especially in human-vehicle interaction scenarios. Traditional methods often rely on longer observation periods, which may not be feasible in real-world situations where pedestrians suddenly appear from blind spots [2][52]. - The study highlights the frequency of momentary observations in datasets, with rates of 2.22 s⁻¹ in the SDD dataset and 1.02 s⁻¹ in the ETH/UCY dataset, emphasizing the need for models that can predict trajectories with limited data [2]. - The proposed Diffusion² model consists of two sequential diffusion models: one for backward prediction of unobserved historical trajectories and another for forward prediction of future trajectories, capturing the causal dependencies between these components [6][7]. Model Architecture - Diffusion² employs a dual diffusion model architecture, incorporating a dual-headed parameterization mechanism to quantify the aleatoric uncertainty of the predicted historical trajectories. This mechanism enhances the model's ability to handle noise in the predictions [4][5][7]. - A time-adaptive noise scheduling module is introduced, which dynamically adjusts the noise scale during the forward diffusion process based on the estimated uncertainty, allowing for more robust trajectory predictions [5][22]. Experimental Results - The Diffusion² model achieved state-of-the-art (SOTA) performance in momentary trajectory prediction tasks across multiple datasets, including ETH/UCY and Stanford Drone datasets, outperforming existing methods [7][44]. - The results indicate significant improvements in average displacement error (ADE) and final displacement error (FDE) metrics compared to previous models, showcasing the effectiveness of the proposed approach [44]. Limitations and Future Work - Despite its successes, Diffusion² faces inherent limitations, particularly in interactive and dense scenarios, where its adaptability may decrease. Future work aims to enhance the model's efficiency and robustness in more complex traffic environments [52][54]. - The article suggests exploring more efficient training and inference methods to reduce computational costs while maintaining prediction quality [53].
蔚来任少卿:世界模型解决的是时空认知,VLA做不到。
自动驾驶之心· 2025-10-09 23:32
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system that goes beyond language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level rather than converting information into language first [3][5]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model is not merely an addition of language but a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporation of physical laws such as gravity and inertia into the model [6]. 2. Ability to understand and predict object movements in three-dimensional space over time [6]. 3. The vast amount of data absorbed by language models from the internet aids in training autonomous driving systems [7]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to other fields. The ongoing debate between VLA and WA represents a larger industry transformation [9]. - The article suggests that those who remain in the industry must be versatile talents with rich technical backgrounds, as the market is expected to undergo significant changes [9]. Community and Learning Resources - A community platform has been established to provide resources for learning and sharing knowledge about autonomous driving, including video tutorials, technical discussions, and job opportunities [11][12][24]. - The community aims to gather individuals from various academic and industrial backgrounds to foster collaboration and knowledge sharing [25].
工业界和学术界大佬带队!彻底搞定端到端与VLA
自动驾驶之心· 2025-10-09 23:32
Core Insights - The article discusses the evolution of end-to-end algorithms in autonomous driving, highlighting the transition from modular production algorithms to end-to-end and now to Vision-Language Alignment (VLA) models [1][3] - It emphasizes the rich technology stack involved in end-to-end algorithms, including BEV perception, visual language models (VLM), diffusion models, reinforcement learning, and world models [3][10] Summary by Sections End-to-End Algorithms - End-to-end algorithms are categorized into two main paradigms: single-stage and two-stage, with UniAD being a representative of the single-stage approach [1] - Single-stage can further branch into various subfields, particularly those based on VLA, which have seen a surge in related publications and industrial applications in recent years [1] VLA and Course Offerings - The article mentions the launch of courses aimed at helping individuals quickly and efficiently learn about end-to-end and VLA in autonomous driving, featuring collaboration between industry and academia [3] - The "VLA and Large Model Practical Course" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA approaches [3] Course Structure and Faculty - The course structure includes a comprehensive overview of VLA, with detailed theoretical foundations in Vision, Language, and Action, as well as practical assignments to build VLA models and datasets from scratch [3][10] - The teaching team consists of experienced professionals from top academic institutions and industry, with backgrounds in multimodal perception, autonomous driving, and large model frameworks [7][9][10] Target Audience and Requirements - The courses are designed for individuals with a foundational understanding of autonomous driving and familiarity with key technologies such as transformer models, reinforcement learning, and BEV perception [13] - Participants are expected to have a basic knowledge of probability theory, linear algebra, and programming skills in Python and PyTorch [13]
算法废物跳槽记
自动驾驶之心· 2025-10-09 23:32
Core Viewpoint - The primary goal of job switching is to achieve better conditions than the current position, which can be evaluated through salary increase, reduced work pressure, or improved career prospects [4]. Preparation Checklist - A comprehensive algorithm question bank is recommended, including 150-200 questions, with specific resources provided for deep learning and coding practice [5]. - A structured academic resume should include project background, core contributions, and quantifiable results [5]. Practical Guide - Understanding the salary structure is crucial, including benchmarks for salary increases, negotiation strategies, and a checklist to avoid pitfalls during the offer process [6]. - Important operational considerations include ensuring social security continuity and managing non-compete agreements [7][8]. Conclusion - The greatest benefit of job switching is the awakening of workplace bargaining power, which includes breaking free from the mindset of needing to work a certain number of years and developing a calm attitude towards job changes [8]. Communication Strategies - A formula for responding to job switching motivations is suggested: "development space + technical match" [9]. - Preferred job-seeking channels are prioritized, with direct communication with HR being the most effective [9]. Salary Negotiation - Reference points for salary negotiation include bank statements and public fund ratios, with a reasonable increase typically being around 20% or more [10]. - Key tactical points involve understanding salary structure, benefits details, and hidden clauses in contracts [10]. Algorithm Questions - A selection of algorithm questions relevant to various companies is provided, including topics like k-means clustering and binary search [11][12]. Community Engagement - The company has established numerous technical discussion groups related to autonomous driving, involving a wide range of technologies and a large community of participants [13][14].
突发!某新势力智驾负责人今日离职
自动驾驶之心· 2025-10-09 12:30
头部新势力智驾一号位今日离职,自动驾驶之心正在持续跟进中~ ...
最近高产的苹果!RL4HS:精准定位LLM幻觉,超越GPT-5及o3!
自动驾驶之心· 2025-10-09 07:30
原文链接: 苹果再发论文:精准定位LLM幻觉,GPT-5、o3都办不到 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 苹果这几天真是进入了论文高产期,时不时就有新的研究发布出来。 就在近日,苹果又发布了一篇引发学界与业界关注的重磅论文。 这篇论文非常有意思,它用强化学习训练模型,让模型能够准确标出答案中哪些部分是幻觉(hallucinated)。 其核心突破在于:模型不再只是笼统地提示有错误,而是能直接指出具体哪一段文字是错误的。这对于需要修改输出或进行事实审查的用户来说,大大节省了时 间。 论文提出的方法名为 RL4HS,它使用了片段级奖励(span-level rewards)和类别感知的 GRPO(Class-Aware Group Relative Policy Optimization),从而避免模型偷 懒、只输出无错误预测。 该方法在片段级幻觉检测任务上,甚至超过了 GPT-5 和 o3。 总体而言,片段级奖励 + 类别平衡机制让 ...