自动驾驶之心
Search documents
Feed-Forward 3D综述:3D视觉进入“一步到位”时代
自动驾驶之心· 2025-10-31 16:03
在 3D 视觉领域, 如何从二维图像快速、精准地恢复三维世界 ,一直是计算机视觉与计 算 机 图 形 学 最 核 心 的 问 题 之 一 。 从 早 期 的 Structure-from-Motion (SfM) 到 Neural Radiance Fields (NeRF) ,再到 3D Gaussian Splatting (3DGS) ,技术的演进让我们 离实时、通用的 3D 理解越来越近。然而,以往的方法往往依赖于每个场景的反复优化 (per-scene optimization),既慢又缺乏泛化能力。在 AI 驱动的新时代,一个全新的范 式正在崛起 —— Feed-Forward 3D 。 点击下方 卡片 ,关注" 3D视觉之心 "公众号 第一时间获取 3D视觉干货 论文标题:Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey 论文链接:https://arxiv.org/abs/2507.14501 论文主页:https://fnzhan.com/projects/Feed-Forward-3D/ 这篇由 ...
世界模型和VLA正在逐渐走向融合统一
自动驾驶之心· 2025-10-31 00:06
Core Viewpoint - The integration of Vision-Language Action (VLA) and World Model (WM) technologies is becoming increasingly evident, suggesting a trend towards unification rather than opposition in the field of autonomous driving [3][5][7]. Technology Development Trends - Recent discussions highlight that VLA and WM should not be seen as mutually exclusive but rather as complementary technologies that can enhance the development of General Artificial Intelligence (AGI) [3]. - The combination of VLA and WM is supported by various academic explorations, including models like DriveVLA-W0, which demonstrate the feasibility of their integration [3]. Industry Insights - The ongoing debate within the industry regarding VLA and WA (World Action) is more about different promotional narratives rather than fundamental technological differences [7]. - Tesla's recent presentations at ICCV are expected to influence domestic perspectives on the integration of VLA and WA [7]. Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community has been established to provide a comprehensive platform for learning and sharing knowledge in the autonomous driving sector, with over 4000 members and plans to expand to nearly 10,000 [10][23]. - The community offers a variety of resources, including video content, learning routes, and Q&A sessions, aimed at both beginners and advanced practitioners in the field [10][12][28]. Technical Learning Paths - The community has compiled over 40 technical learning routes covering various aspects of autonomous driving, including perception, simulation, planning, and control [24][44]. - Specific learning paths are available for newcomers, including full-stack courses suitable for those with no prior experience [20][17]. Networking and Career Opportunities - The community facilitates connections between members and industry leaders, providing job referral mechanisms and insights into career opportunities within the autonomous driving sector [19][10]. - Members can engage in discussions about research directions, job choices, and industry trends, fostering a collaborative environment for knowledge exchange [97][101].
ICCV 2025 | 高德SeqGrowGraph:一种车道图增量式生成新范式
自动驾驶之心· 2025-10-31 00:06
Core Insights - The article presents SeqGrowGraph, an innovative framework for lane graph autoregressive modeling, which addresses the challenges of constructing high-precision lane maps for autonomous driving systems [18] Group 1: Background and Motivation - The construction of local high-precision maps (online mapping) has become a hot topic in the industry, with lane graph generation being a critical component [2] - Current mainstream technical routes for lane graph generation can be categorized into detection-based and generation-based methods [2] Group 2: Methodology - SeqGrowGraph defines the lane graph as a directed graph G=(V, E), where V represents intersections or key topological nodes, and E represents the lane centerlines connecting the nodes [6] - The core method involves a chain of graph expansions, where the graph construction is completed incrementally by introducing new nodes and updating adjacency and geometry matrices [8][10] - The model architecture follows a mainstream Encoder-Decoder structure, utilizing a BEV encoder to extract features and a Transformer decoder for autoregressive sequence generation [10][11] Group 3: Experimental Validation - SeqGrowGraph was comprehensively evaluated on large-scale autonomous driving datasets nuScenes and Argoverse 2, demonstrating superior performance compared to leading methods in the field [13][14] - Quantitative analysis showed that SeqGrowGraph achieved state-of-the-art performance in topology accuracy metrics such as Landmark and Reachability on both standard and challenging dataset partitions [14][15] Group 4: Qualitative Analysis - Visual results highlighted the advantages of SeqGrowGraph, showcasing its ability to generate topologically continuous, structurally complete, and geometrically accurate lane graphs, while effectively merging redundant nodes from real-world map data [16] Group 5: Conclusion - The SeqGrowGraph framework not only aligns more closely with human structured reasoning but also effectively overcomes inherent limitations of existing methods in handling complex topologies, such as loops [18]
RAD:通过3DGS结合强化学习的端到端自动驾驶
自动驾驶之心· 2025-10-31 00:06
Core Insights - The paper addresses challenges in deploying end-to-end autonomous driving (AD) algorithms in real-world scenarios, focusing on causal confusion and the open-loop gap [1][2] - It proposes a closed-loop reinforcement learning (RL) training paradigm based on 3D Gaussian Splatting (3DGS) technology to enhance the robustness of AD strategies [2][8] Summary by Sections Problem Statement - The paper identifies two main issues: causal confusion, where imitation learning (IL) captures correlations rather than causal relationships, and the open-loop gap, where IL strategies trained in an open-loop manner perform poorly in real-world closed-loop scenarios [1][2][6] Related Research - The paper references various fields related to the study, including dynamic scene reconstruction, end-to-end autonomous driving, and reinforcement learning, highlighting existing methods and their limitations [3][4][5][7] Proposed Solution - The proposed RAD framework integrates 3DGS technology with RL and IL, employing a three-stage training paradigm: perception pre-training, planning pre-training, and reinforced post-training [8][24] - It includes a specially designed safety-related reward function to guide the AD strategy in handling safety-critical events [11][24] Experimental Validation - The paper details extensive experiments, including data collection of 2000 hours of human expert driving demonstrations and the creation of 4305 high-collision-risk traffic clips for training and evaluation [15][24] - Nine key performance indicators (KPIs) are used to assess the AD strategy, including dynamic collision ratio (DCR) and static collision ratio (SCR) [12][15][24] Key Findings - The RAD framework outperforms existing IL methods, achieving a threefold reduction in collision rates (CR) and demonstrating superior performance in complex dynamic environments [9][12][24] - The optimal RL-IL ratio of 4:1 was found to balance safety and trajectory consistency effectively [12][15] Future Directions - The paper suggests further exploration in areas such as enhancing the interactivity of the 3DGS environment, improving rendering techniques, and expanding the application of RL [17][21][22][29]
哈工大最新一篇长达33页的工业智能体综述
自动驾驶之心· 2025-10-31 00:06
Core Insights - The article discusses the rapid evolution of Large Language Models (LLMs) into Industrial Agents, emphasizing their application in high-risk industries such as finance, healthcare, and manufacturing, and the challenges of transforming their potential into practical productivity [2][4]. Group 1: Key Technologies - Industrial agents require a "cognitive loop" for real-world interaction, relying on three core technologies: Memory, Planning, and Tool Use, which together enhance their decision-making and collaborative capabilities [5][18]. - Memory mechanisms evolve through five stages, from simple working memory to collective knowledge bases, enabling long-term task coherence and collaborative learning among agents [11][12]. - Planning capabilities progress from linear task execution to autonomous goal generation, reflecting the depth of decision-making in complex problem-solving [15][16]. - Tool usage evolves from passive invocation to active creation, allowing agents to design new tools to address capability gaps [18][19]. Group 2: Capability Maturity Framework - The article introduces a five-level capability maturity framework for industrial agents, defining their core abilities and application boundaries at each level, from basic process execution to adaptive social systems [18][20]. - Level 1 focuses on process execution systems that translate instructions, while Level 5 represents adaptive social systems capable of autonomous goal generation and environmental collaboration [18][20]. Group 3: Evaluation of Industrial Agents - Evaluating industrial agents involves two main dimensions: foundational capability verification and industry practice adaptation, with standardized benchmarks established for memory, planning, and tool usage [20][23]. - The evaluation framework includes various tests for memory accuracy, planning decision-making, and tool usage efficiency, ensuring agents meet industry-specific requirements [23][24]. Group 4: Application Areas - Industrial agents demonstrate significant potential across various sectors, enhancing efficiency and reducing risks by automating complex tasks and standardizing processes [25][26]. - In software development, agents can manage the entire process from requirement analysis to deployment, while in scientific research, they assist in data analysis and autonomous exploration [26][27]. - The healthcare sector benefits from agents that support diagnostic reasoning and treatment planning, ensuring safety and reliability in high-stakes environments [25][26]. Group 5: Challenges and Future Directions - Despite advancements, industrial agents face challenges in technology, evaluation, and organizational integration, requiring breakthroughs in several areas to achieve widespread adoption [31][34]. - Future trends include enhancing the integration of generative and predictive modeling, improving real-time capabilities, and addressing ethical concerns related to autonomous decision-making [31][34].
禾赛科技和图达通的专利大战
自动驾驶之心· 2025-10-30 03:31
Core Viewpoint - Hesai Technology has officially filed a lawsuit against Tudatong for patent infringement related to the newly showcased Lingque E1X at CES 2025, which bears a striking resemblance to Hesai's AT series products [3][4]. Group 1: Patent Infringement Case - The lawsuit involves similarities in appearance and interface between Tudatong's Lingque E1X and Hesai's ATX, as well as the adoption of the same "905nm wavelength + one-dimensional scanning" technology [3][4]. - Hesai has reported that several of its North American employees have joined Tudatong, including a senior director [3]. - The case arises as Tudatong transitions from a focus on 1550nm technology to a dual strategy involving both 1550nm and 905nm products, which has led to a critical phase for its IPO [4]. Group 2: Market Dynamics and Competition - The laser radar industry has seen intense price competition, particularly affecting new entrants, which is detrimental to the industry's overall development [5]. - Hesai's ATX, launched in April 2024, has secured partnerships with over ten leading domestic automakers and has commenced large-scale production [5]. - Hesai has achieved a significant milestone by producing its one-millionth laser radar unit by the end of September 2025, becoming the first company to reach this annual production volume [5].
理想ICCV'25分享了世界模型:从数据闭环到训练闭环
自动驾驶之心· 2025-10-30 00:56
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the transition from data closed-loop systems to training closed-loop systems, marking a new phase in autonomous driving development [17][20]. Group 1: Development of Li Auto's VLA Model - Li Auto's VLA driver model has evolved through various stages, from rule-based systems to AI-driven E2E+VLM systems, with a strong emphasis on navigation as a key module [6]. - The end-to-end mass production version of MPI has reached over 220 units, representing a 19-fold increase compared to the version from July 2024 [12]. Group 2: Data Closed-Loop Value - The data closed-loop process includes shadow mode validation, data mining in the cloud, automatic labeling of effective samples, and model training, with a data return time of one minute [9][10]. - Li Auto has accumulated 1.5 billion kilometers of driving data, utilizing over 200 triggers to produce 15-45 second clip data [10]. Group 3: Transition to Training Closed-Loop - The core of the L4 training loop involves VLA, reinforcement learning (RL), and world models (WM), optimizing trajectories through diffusion and reinforcement learning [22]. - Key technologies for closed-loop autonomous driving training include regional simulation, synthetic data, and reinforcement learning [24]. Group 4: Simulation and Generation Techniques - Simulation relies on scene reconstruction, including visual and Lidar reconstruction, while synthetic data generation utilizes multimodal techniques [25]. - Li Auto's recent advancements in reconstruction and generation have led to significant improvements, with multiple top conference papers published in the last two years [26][29][31]. Group 5: Interactive Agents and System Capabilities - The development of interactive agents is highlighted as a critical challenge in the training closed-loop [37]. - System capabilities are enhanced through world models providing simulation environments, diverse scene construction, and accurate feedback from reward models [38]. Group 6: Community and Collaboration - The article mentions the establishment of nearly a hundred technical discussion groups related to various autonomous driving technologies, with a community of around 4,000 members and over 300 companies and research institutions involved [44][45].
传统规划控制不太好找工作了。。。
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The article emphasizes the evolving landscape of autonomous driving, highlighting the shift from traditional planning and control methods to end-to-end approaches, which are increasingly favored in the industry [2][29]. Summary by Sections Course Offerings - The company has designed a specialized course on end-to-end planning and control in autonomous driving, aimed at addressing real-world challenges and enhancing employability [6][12]. - The course will cover essential algorithms and frameworks used in the industry, focusing on practical applications and integration of traditional and modern methods [6][21]. Course Structure - The course consists of six chapters, each focusing on different aspects of planning and control, including foundational algorithms, decision-making frameworks, and handling uncertainty in environments [20][24][29]. - The course will also include interview preparation, resume enhancement, and mock interviews to support participants in securing job offers [31][10]. Target Audience - The course is designed for individuals with a background in vehicle engineering, automation, computer science, and related fields, particularly those seeking to transition into autonomous driving roles [37][39]. - Participants are expected to have a basic understanding of programming and relevant mathematical concepts to fully benefit from the course [43]. Instructor Expertise - The course will be led by an experienced instructor with a strong background in autonomous driving algorithms and practical implementation, ensuring that participants receive high-quality guidance [34][10]. Additional Benefits - Participants will have access to supplementary resources, including code and development environments, to enhance their learning experience [13][15]. - The course aims to provide a comprehensive understanding of the industry, equipping participants with the skills needed to tackle complex problems in autonomous driving [6][13].
繁华落幕,人形机器人或将进入寒冬
自动驾驶之心· 2025-10-30 00:04
以下文章来源于天南AI茶馆 ,作者天南 天南AI茶馆 . 全网最有趣的全栈人形机器人博主,擅长给技术圈讲产业,给产业圈讲技术。 作者 | 天南 来源 | 天南AI茶馆 Meta 首席 AI 科学家 LeCun ,说 机器人行业远未实现真正智能。而 Google deepMind负责人 最近也提到:人形机器人进入家庭市场至少还要5- 10年。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 最近,见证了人形机器人行业太多的不及预期。很多人问我,从技术角度上来看,人形机器人行业是否要进入一段寒冬了。 今天我们通过理性的逻辑分析,来看目 行业发展的真实情况。 最近看到了太多的不及预期。 国外公司的表现和大牛预言都不是很乐观: 特斯拉Gen2 因为发热、灵巧手短命的问题,被迫暂停今年的量产计划。而Gen3再次跳票,推迟到明年Q1。 Figure03 本来 万分期待,但 被时代周刊爆出来多次拍摄剪辑。 反观国内,倒是有些虚假的繁荣: 订单飞起 ,但被爆出多数为左手倒右手订单、 ...
IROS'25冠军方案:X-VLA重磅开源,全面刷新机器人SOTA!
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The article discusses the launch of the X-VLA model, a groundbreaking open-source model in the field of embodied intelligence, which has achieved significant performance improvements in autonomous tasks such as folding clothes, showcasing its robustness and generalization capabilities [2][5][7]. Group 1: Model Performance and Achievements - X-VLA is the first open-source model to accomplish a 120-minute autonomous clothing folding task without assistance, achieving state-of-the-art (SOTA) performance with only 0.9 billion parameters across five authoritative simulation benchmarks [2][7]. - The X-VLA team won first place in the IROS-AGIBOT World Challenge, competing against 431 teams from 23 countries, demonstrating exceptional performance in real physical tasks such as grasping, folding, cooking, and pouring [4][5]. Group 2: Technical Innovations - The model employs a Soft-Prompt mechanism to enhance adaptability across different robotic platforms, allowing for improved stability and efficiency in training with heterogeneous data [16]. - A multi-modal encoding strategy is introduced to handle diverse visual inputs, optimizing resource allocation while maintaining information integrity [16]. - The use of flow-matching in the action decoder enhances the smoothness and robustness of action trajectories, crucial for executing long-sequence tasks [17]. Group 3: Data and Training Strategies - X-VLA utilizes a balanced data sampling strategy to ensure equitable training across heterogeneous datasets, preventing model bias [21]. - A rigorous data cleaning and temporal alignment pipeline is implemented to enhance the quality and consistency of the state-action sequences [21]. - The model's training process includes a customized post-training workflow that allows for efficient adaptation to specific tasks using smaller datasets [23][26]. Group 4: Experimental Results - In various authoritative simulation environments, X-VLA achieved SOTA performance, significantly outperforming existing models [24]. - The model demonstrated strong performance in real-world robotic tasks, successfully completing complex long-duration tasks like autonomous clothing folding [27].