自动驾驶之心
Search documents
自动驾驶之心全平台课程&星球活动进行中
自动驾驶之心· 2025-09-28 23:33
Group 1 - The article promotes various discounts and offers for courses, including a 70% discount and a reduction of 80 or 99 yuan for specific courses [1][3][4] - A yearly subscription to the "Big Model Planet" is available for 99 yuan, which includes technology, industry insights, and job hunting resources [1] - The platform offers a 1v1 tutoring service with a maximum discount of 1000 yuan off a 5000 yuan fee, and a 1v6 paper tutoring service with a 1000 yuan reduction [1] Group 2 - The "Automatic Driving Heart" section highlights cutting-edge self-driving technology, featuring nearly 40 learning routes covering topics like world models, closed-loop simulation, and BEV perception [6] - The community facilitates face-to-face interactions with industry leaders and top conference authors, providing insights into the future direction of self-driving technology [6] - The article emphasizes the importance of continuous learning and engagement with the latest advancements in the self-driving sector [6]
撞墙的不是Scaling Laws,是AGI。
自动驾驶之心· 2025-09-28 23:33
Core Viewpoint - The article posits that scaling laws do not necessarily lead to AGI (Artificial General Intelligence) and may even diverge from it, suggesting that the underlying data structure is a critical factor in the effectiveness of AI models [1]. Group 1: Data and Scaling Laws - The scaling laws are described as an intrinsic property of the underlying data, indicating that the performance of AI models is heavily reliant on the quality and distribution of the training data [14]. - It is argued that the raw internet data mix is unlikely to provide the optimal data distribution for achieving AGI, as not all tokens are equally valuable, yet the same computational resources are allocated per token during training [15]. - The article emphasizes that the internet data, while abundant, is actually sparse in terms of useful contributions, leading to a situation where AI models often only achieve superficial improvements rather than addressing core issues [8]. Group 2: Model Development and Specialization - GPT-4 is noted to have largely exhausted the available internet data, resulting in a form of intelligence that is primarily based on language expression rather than specialized knowledge in specific fields [9]. - The introduction of synthetic data by Anthropic in models like Claude Opus 3 has led to improved capabilities in coding, indicating a shift towards more specialized training data [10]. - The trend continues with GPT-5, which is characterized by a smaller model size but greater specialization, leading to a decline in general conversational abilities that users have come to expect [12]. Group 3: Economic Considerations and Industry Trends - Due to cost pressures, AI companies are likely to move away from general-purpose models and focus on high-value areas such as coding and search, which are projected to have significant market valuations [7][12]. - The article raises concerns about the sustainability of a single language model's path to AGI, suggesting that the reliance on a "you feed me" deep learning paradigm limits the broader impact of AI on a global scale [12].
清华教研团队!两个月从零搭建一套自己的自动驾驶VLA模型
自动驾驶之心· 2025-09-28 07:21
Core Viewpoint - The focus of academia and industry after end-to-end systems is on VLA (Vision-Language-Action), which provides human-like reasoning capabilities for safer and more reliable autonomous driving [1][4]. Summary by Sections Introduction to Autonomous Driving VLA - VLA is categorized into modular VLA, integrated VLA, and reasoning-enhanced VLA, which are essential for advancing autonomous driving technology [1][4]. Technical Maturity and Employment Demand - The demand for autonomous driving VLA solutions is high among major companies, prompting them to invest in self-research and development [4]. Course Overview - A comprehensive learning roadmap for autonomous driving VLA has been designed, covering principles to practical applications [4][6]. Core Content of Autonomous Driving VLA - Key topics include visual perception, large language models, action modeling, model deployment, and dataset creation, with cutting-edge algorithms like CoT, MoE, RAG, and reinforcement learning [6]. Course Collaboration - The course is developed in collaboration with Tsinghua University's research team, featuring detailed explanations of algorithms and practical assignments [6]. Course Structure - The course consists of six chapters, each focusing on different aspects of VLA, including algorithm introduction, foundational algorithms, VLM as an interpreter, modular and integrated VLA, reasoning-enhanced VLA, and a final project [12][20]. Chapter Details - Chapter 1 covers the concept and history of VLA algorithms, including benchmarks and evaluation metrics [13]. - Chapter 2 focuses on foundational algorithms related to Vision, Language, and Action, along with model deployment [14]. - Chapter 3 discusses VLM's role as an interpreter in autonomous driving, highlighting key algorithms [15]. - Chapter 4 delves into modular and integrated VLA, emphasizing the evolution of language models in planning [16]. - Chapter 5 explores reasoning-enhanced VLA, introducing new modules for decision-making and action output [17]. - Chapter 6 involves a hands-on project where participants build and fine-tune their models [20]. Learning Outcomes - The course aims to deepen understanding of VLA's current advancements and core algorithms, equipping participants with practical skills for future research and applications in the autonomous driving sector [22][26]. Course Schedule - The course is set to begin on October 20, with a structured timeline for each chapter's release [23]. Prerequisites - Participants are expected to have a foundational knowledge of autonomous driving, large models, reinforcement learning, and programming skills in Python and PyTorch [26].
为什么自动驾驶中的强化学习,没有很好的落地?
自动驾驶之心· 2025-09-28 03:50
Core Viewpoint - The article discusses the challenges of implementing reinforcement learning (RL) in the field of autonomous driving, particularly focusing on the issue of reward hacking and the balance between safety and efficiency [2][3]. Group 1: Challenges in Reinforcement Learning for Autonomous Driving - Reinforcement learning faces a significant issue known as reward hacking, where increasing safety requirements can lead to decreased efficiency, and vice versa [2]. - Designing a balanced reward system that can enhance overall performance in RL models is complex, as achieving equilibrium among multiple rewards is challenging [2]. - The application of RL in autonomous driving is complicated by the need to adhere to various driving rules during the driving process, unlike in embodied intelligence where the focus is primarily on local motion [2]. Group 2: Need for a Suitable Framework - A crucial factor for the successful implementation of RL in autonomous driving is the development of a robust architecture that can effectively integrate with RL [3]. - Existing models in autonomous driving are unlikely to be directly applicable to RL without significant modifications [3]. Group 3: Community and Resources - The "Autonomous Driving Knowledge Planet" community aims to provide a comprehensive platform for technical exchange and learning in the field of autonomous driving, with over 4,000 members [6][10]. - The community offers a variety of resources, including learning routes, technical discussions, and access to industry experts, to assist both beginners and advanced practitioners in the field [6][10].
UCLA最新!大模型时序推理和Agentic系统的全面综述
自动驾驶之心· 2025-09-27 23:33
Core Insights - The article discusses the emergence of Time Series Reasoning (TSR) as a new field that integrates large language models (LLMs) with time series data analysis, addressing the limitations of traditional methods [2][8][39] - TSR aims to enhance the capabilities of time series analysis by providing explicit reasoning, causal inference, and decision-making abilities, moving beyond mere prediction and classification [2][8][39] Summary by Sections Traditional Time Series Analysis Limitations - Traditional methods like ARIMA and LSTM excel in specific tasks but face three key limitations: lack of interpretability, inability to handle causal relationships, and insufficient dynamic responses [8][14] - LLMs offer new tools to overcome these limitations by providing explicit reasoning processes, generating causal hypotheses, and enabling interaction with external tools [2][8] Emergence of Time Series Reasoning - TSR is defined as the method of performing explicit structured reasoning on time-indexed data using LLMs, integrating multimodal contexts and agent systems [8][39] - A recent survey from a collaborative team outlines a clear definition of TSR and presents a three-dimensional classification framework covering reasoning structure, task objectives, and technical features [3][9] Three-Dimensional Classification Framework - The framework categorizes TSR into three dimensions: reasoning topology (how reasoning is conducted), core objectives (why reasoning is performed), and attribute labels (auxiliary features of methods) [9][24] - Reasoning topology includes three types: direct reasoning, linear chain reasoning, and branch-structured reasoning, each with varying complexity and capabilities [12][22] Reasoning Topology - Direct reasoning is the simplest form, providing results without showing intermediate steps, which limits interpretability [15] - Linear chain reasoning introduces ordered steps, enhancing interpretability and modularity [18] - Branch-structured reasoning allows for multiple paths and self-correction, increasing flexibility and adaptability [22] Core Objectives of Time Series Reasoning - The core objectives of TSR are categorized into four types: traditional time series analysis, explanation and understanding, causal inference and decision-making, and time series generation [24][28] - Each objective aims to enhance the performance and flexibility of traditional tasks through LLM integration [28] Attribute Labels - Attribute labels provide additional features for classifying methods, including control flow operations, execution agents, information sources, and LLM alignment methods [29][30] - These labels help researchers refine their work and understand the nuances of different approaches [29] Resources and Tools - The article emphasizes the importance of resources and tools for advancing the field, categorizing them into reasoning-first benchmarks, reasoning-ready benchmarks, and general-purpose benchmarks [33][36] - These resources are essential for researchers to test and validate their methodologies effectively [33] Future Directions and Challenges - The field faces several challenges, including standardizing evaluation metrics for reasoning quality, integrating multimodal data, and ensuring the robustness and safety of agent systems [38][39] - Addressing these challenges will define the future trajectory of time series reasoning, aiming for large-scale reliability in critical sectors like finance, healthcare, and energy [39]
会自检的VLA!ReflectDrive:更安全更高效scaling的端到端框架(理想&清华)
自动驾驶之心· 2025-09-27 23:33
Core Viewpoint - ReflectDrive is a novel learning framework that integrates a reflective mechanism to achieve safe trajectory generation through discrete diffusion, addressing the challenges in end-to-end autonomous driving systems [4][46]. Group 1: Introduction and Background - Autonomous driving is leading the transportation industry towards a safer and more efficient future, with end-to-end (E2E) systems becoming a mainstream alternative to traditional modular designs [4]. - Visual-Language-Action (VLA) models combine pre-trained knowledge from visual-language models (VLM) to enhance adaptability in complex scenarios [4][5]. - Current learning-based methods have not resolved core challenges in imitation learning driving systems, particularly in encoding physical rules like collision avoidance [4][5]. Group 2: ReflectDrive Framework - ReflectDrive proposes a new learning framework that utilizes a discrete diffusion reflective mechanism for safe trajectory generation [3][12]. - The framework begins by discretizing the two-dimensional driving space to construct an action codebook, allowing fine-tuning of pre-trained diffusion language models for planning tasks [3][14]. - The reflective mechanism operates without gradient calculations, enabling iterative self-correction inspired by spatiotemporal joint planning [3][8]. Group 3: Methodology and Mechanism - The reflective inference process consists of two stages: target condition trajectory generation and safety-guided regeneration [20][25]. - The framework integrates safety metrics to evaluate generated multimodal trajectories, identifying unsafe path points through local search methods [8][25]. - The iterative optimization loop continues until the trajectory is deemed safe or computational limits are reached, ensuring high efficiency in real-time performance [31][32]. Group 4: Experimental Results - ReflectDrive was evaluated on the NAVSIM benchmark, demonstrating significant improvements in safety metrics such as collision rates and compliance with drivable areas [32][38]. - The introduction of the safety-guided regeneration mechanism led to substantial enhancements in safety indicators, with notable increases in DAC (3.9%), TTC (1.3%), NC (0.8%), and EP (7.9%) compared to the baseline [37][38]. - When using ground-truth agent information, ReflectDrive's performance approached human driving levels, achieving NC of 99.7% and DAC of 99.5% [38][39]. Group 5: Conclusion - ReflectDrive effectively integrates a reflective mechanism with discrete diffusion for safe trajectory generation, validated by its performance on the NAVSIM benchmark [46].
NeurIPS 2025 | SURDS 数据集与 GRPO 全面强化自驾空间推理
自动驾驶之心· 2025-09-27 23:33
Core Insights - The article discusses the challenges of achieving accurate spatial reasoning in autonomous driving scenarios using Vision Language Models (VLMs), highlighting the lack of large-scale benchmarks in this area [2][20]. - A new benchmark called SURDS has been introduced to systematically evaluate the spatial reasoning capabilities of VLMs, revealing significant shortcomings in current models [4][20]. Benchmark Overview - SURDS is a large-scale benchmark based on the nuScenes dataset, consisting of 41,080 visual-question training instances and 9,250 evaluation samples, covering six spatial categories: direction recognition, pixel-level localization, depth estimation, distance comparison, left-right ordering, and front-back relationships [4][20]. - The dataset includes diverse multimodal information collected from urban environments in Boston and Singapore, ensuring a realistic testing scenario [6][20]. Model Training and Evaluation - The research emphasizes the importance of data generation and introduces a novel automated process for generating high-quality reasoning chains, which enhances the model's spatial reasoning capabilities [8][10]. - A reinforcement learning framework combining spatial localization rewards and logical consistency objectives was designed, leading to significant performance improvements in various tasks [11][20]. Experimental Results - The evaluation results show that different models exhibit notable differences in spatial reasoning tasks, with the proposed model achieving a nearly 60% improvement in depth estimation accuracy compared to the second-best model [14][20]. - The study reveals that most existing models struggle with single-object tasks, often performing close to random levels, indicating a need for better learning of absolute pose and metric information [16][20]. Training Strategy Insights - Ablation studies indicate that combining localization and logical rewards significantly enhances model performance, underscoring the foundational role of localization ability in spatial reasoning [16][18]. - The research also highlights that the scale of model parameters does not directly correlate with spatial understanding capabilities, suggesting that simply increasing model size is insufficient [16][20].
合伙人招募!4D标注/世界模型/VLA/模型部署等方向
自动驾驶之心· 2025-09-27 23:33
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced models and technologies related to autonomous driving, such as large models, multimodal models, and 3D target detection [3] - Candidates are preferred to have a master's degree or higher from universities ranked within the QS200, with priority given to those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing related to job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives [5] - Opportunities for collaboration on entrepreneurial projects are also highlighted [5] - Interested parties are encouraged to contact via WeChat for further inquiries regarding collaboration in the autonomous driving field [6]
被时代落下的老牌车企们,真得补作业了...
自动驾驶之心· 2025-09-27 06:13
Core Viewpoint - The automotive industry is experiencing anxiety due to an impending price war starting in December 2024, prompting traditional automakers to focus on smart driving technologies [3][12]. Group 1: Industry Dynamics - The second half of 2025 may represent a turning point for traditional automakers as they adapt to new market conditions [5]. - A significant shift in leadership is occurring within the automotive sector, with 13 companies undergoing executive changes in September alone [10]. - The smart driving sector is accelerating, with traditional automakers determined not to fall behind [12]. Group 2: Strategic Approaches - Traditional automakers are adopting various strategies to enhance their competitiveness: - Acquisition Strategy: Companies like FAW are directly acquiring stakes in tech firms, such as the 35.8% stake in Zhuoyu Technology [6][17]. - Broad Investment Strategy: Some companies are diversifying their partnerships with multiple suppliers, including GAC's collaborations with Huawei and others [18]. - Partnership Strategy: Companies like Seres are exploring symbiotic relationships with tech firms [19]. - Self-Development Strategy: BYD and Changan are focusing on in-house development, with Changan recently achieving significant progress in its parking technology [24]. Group 3: Market Trends - The penetration rate of new energy vehicles in China is expected to exceed 50% by July 2025, marking a critical milestone for the industry [27]. - The competition in the smart driving sector is intensifying, with the market dynamics shifting as traditional manufacturers ramp up their efforts [28].
某头部车企的自研大考......
自动驾驶之心· 2025-09-26 16:03
Core Viewpoint - The article discusses the challenges and pressures faced by a leading automotive company's self-driving research team as they approach critical deadlines for developing advanced autonomous driving technologies, highlighting the competitive landscape and the importance of effective management in achieving technological advancements [6][8][14]. Group 1: Development Goals and Challenges - The self-driving research team of a leading automotive company has set ambitious internal goals to develop a no-map urban Navigation on Autopilot (NOA) by September 30 and an end-to-end system by December 30 [6]. - The company is currently lagging behind new entrants and leading autonomous driving firms by at least a year in terms of research and development progress [8]. - The pressure is high for the smart driving leaders, as failure to meet these deadlines could lead to accountability issues and organizational turmoil [7][8]. Group 2: Investment and Talent Acquisition - The company has significantly increased its investment in autonomous driving technology, surpassing that of some new entrants, and is willing to offer competitive salaries to attract top talent [9]. - Unlike some new entrants that offer compensation packages tied to stock performance, this leading company provides more cash to avoid fluctuations in employee compensation due to stock price volatility [9]. Group 3: Technical and Management Issues - Despite substantial investments, the company faces challenges in the end-to-end development process, particularly in data management, which is crucial for training models effectively [10]. - Traditional automotive companies often struggle with a lack of algorithmic expertise among their leadership, which affects their ability to manage and innovate in autonomous driving technology [13]. - The management approach in traditional firms tends to focus on coding output rather than the underlying algorithmic thought processes, which contributes to lower technical output compared to new entrants [14]. Group 4: Future Outlook and User Experience - The company plans to widely implement high-level urban NOA in numerous models next year, contingent on the success of its self-developed end-to-end system [15]. - The upcoming year is expected to be pivotal for end-to-end systems, as both new entrants and leading firms are achieving performance levels that meet consumer expectations [15]. - The emphasis will shift towards ensuring that the technology not only functions but also provides a satisfactory user experience, as performance differences among various end-to-end systems can significantly impact consumer perception [16].