Workflow
自动驾驶之心
icon
Search documents
地平线HSD的确值得留意
自动驾驶之心· 2025-10-29 03:30
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the performance of Horizon's HSD system compared to Li Auto's VLA system, highlighting the strengths and weaknesses of both [5][6]. Group 1: Technology Comparison - Horizon's HSD technology architecture utilizes visual information for trajectory output, with laser radar positioning as a safety redundancy, while the VLA system is criticized for its high computational and bandwidth requirements [5]. - During a test drive of the Horizon HSD engineering vehicle, the experience was reported to be significantly better than the current production version of Li Auto's VLA, particularly in terms of comfort and smoothness during traffic conditions [6]. - Feedback from the Horizon team indicated that the HSD system performs well in controlled environments but has limitations in extreme weather and complex scenarios, suggesting a need for further development [7]. Group 2: Community and Collaboration - The article mentions the establishment of nearly a hundred technical discussion groups related to various aspects of autonomous driving, with a community of around 4,000 members and over 300 companies and research institutions involved [8]. - The collaboration between Horizon and vehicle manufacturers is emphasized, with a focus on integrating user interface elements that respect manufacturer preferences, which can impact the overall driving experience [7]. Group 3: Future Outlook - The article suggests that while the HSD system shows promise, it is still in development and may not yet reach full autonomous driving capabilities, estimating it to be around 60% of the level of Li Auto's V13 system [7].
ICCV 2025「端到端自动驾驶」冠军方案分享!
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article highlights the victory of Inspur's AI team in the Autonomous Grand Challenge 2025, where they achieved a score of 53.06 in the end-to-end autonomous driving track using their innovative framework "SimpleVSF" [2][7][13] - The framework integrates bird's-eye view perception trajectory prediction with a vision-language multimodal model, enhancing decision-making capabilities in complex traffic scenarios [2][5][8] Summary by Sections Competition Overview - The ICCV 2025 Autonomous Driving Challenge is a significant international event focusing on autonomous driving and embodied intelligence, featuring three main tracks [4] - The end-to-end driving challenge evaluates trajectory prediction and behavior planning using a data-driven simulation framework, emphasizing safety and efficiency across nine key metrics [4] Technical Challenges - End-to-end autonomous driving aims to reduce errors and information loss from traditional modular approaches, yet struggles with decision-making in complex real-world scenarios [5] - Current methods can identify basic elements but fail to understand higher-level semantics and situational awareness, leading to suboptimal decisions [5] Innovations in SimpleVSF Framework - The SimpleVSF framework bridges the gap between traditional trajectory planning and semantic understanding through a vision-language model (VLM) [7][8] - The VLM-enhanced scoring mechanism improves decision quality and scene adaptability, resulting in a 2% performance increase for single models and up to 6% in fusion decision-making [8][11] Decision-Making Mechanism - The dual fusion decision mechanism combines quantitative and qualitative assessments, ensuring optimal trajectory selection based on both numerical and semantic criteria [10][11] - The framework employs advanced models for generating diverse candidate trajectories and extracting robust environmental features, enhancing overall system performance [13] Achievements and Future Directions - The SimpleVSF framework's success in the challenge sets a new benchmark for end-to-end autonomous driving technology, supporting further advancements in the field [13] - Inspur's AI team aims to leverage their algorithmic and computational strengths to drive innovation in autonomous driving technology [13]
Dream4Drive:一个能够提升下游感知性能的世界模型生成框架
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses the development of Dream4Drive, a new synthetic data generation framework aimed at enhancing downstream perception tasks in autonomous driving, emphasizing the importance of high-quality, controllable multimodal video generation [1][2][5]. Group 1: Background and Motivation - 3D perception tasks like object detection and tracking are critical for decision-making in autonomous driving, but their performance heavily relies on large-scale, manually annotated datasets [4]. - Existing methods for synthetic data generation often overlook the evaluation of downstream perception tasks, leading to a misrepresentation of the effectiveness of synthetic data [5][6]. - The need for diverse and extreme scenario data is highlighted, as current data collection methods are time-consuming and labor-intensive [4]. Group 2: Dream4Drive Framework - Dream4Drive decomposes input videos into multiple 3D-aware guidance maps, rendering 3D assets onto these maps to generate edited, multi-view realistic videos for training perception models [1][9]. - The framework utilizes a large-scale 3D asset dataset, DriveObj3D, which includes typical categories from driving scenarios, supporting diverse 3D perception video editing [2][9]. - Experiments show that Dream4Drive can significantly enhance perception model performance with only 420 synthetic samples, which is less than 2% of the real sample size [6][27]. Group 3: Experimental Results - The article presents comparative results demonstrating that Dream4Drive outperforms existing models in various training epochs, achieving higher mean Average Precision (mAP) and nuScenes Detection Score (NDS) [27][28]. - High-resolution synthetic data (512×768) leads to significant performance improvements, with mAP increasing by 4.6 percentage points (12.7%) and NDS by 4.1 percentage points (8.6%) [29][30]. - The findings indicate that the position of inserted assets affects performance, with distant insertions generally yielding better results due to reduced occlusion issues [37][38]. Group 4: Conclusions and Implications - The study concludes that existing evaluations of synthetic data in autonomous driving are biased, and Dream4Drive provides a more effective approach for generating high-quality synthetic data for perception tasks [40][42]. - The results emphasize the importance of using assets that match the style of the dataset to minimize the domain gap between synthetic and real data, enhancing model training [42].
RL训练中,为什么熵减往往意味着训练收敛?
自动驾驶之心· 2025-10-29 00:04
作者 | skydownacai 转自 | RL训练中,为什么熵减往往意味着训练收敛? 原文链接: https://zhuanlan.zhihu.com/p/1950579532802270647 $\frac{1}{2}\pi\pm\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, $\frac{1}{2}\pi\pi$, \(\frac{1}{2}\pi\ 本文只做学术分享,如有侵权,联系删文 ,欢迎添加小助理微信AIDriver004做进一步咨询 最近半年以来,有关于RL+Entropy的研究非常的多。对于离散的动作空间 , 策略 在状态 处的entropy为 $${\mathcal{H}}\left(\pi\left(\cdot|s\right)\right):=\mathbb{E}_{a\sim\pi\left(\cdot|s\right)}\left[-\log\pi\left(a| ...
博世Dino-Diffusion:端到端泊车无惧天气影响,解决跨域鸿沟
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses advancements in autonomous parking systems, particularly focusing on a modular approach that combines visual foundation models and diffusion models to enhance cross-domain generalization capabilities [8][33]. Group 1: Autonomous Driving Technology - Autonomous driving technology has rapidly developed, with nearly 60% of new cars globally equipped with some form of autonomous driving features [6]. - Parking-related accidents account for 20% of all vehicle accidents in the U.S., with 91% occurring during reverse maneuvers, highlighting the need for precise perception, planning, and control [6]. Group 2: Proposed System and Methodology - The proposed Dino-Diffusion Parking (DDP) system integrates a robust perception module based on the DINOv2 model and a diffusion model for trajectory planning, enhancing the system's ability to generalize across different domains [8][9]. - The DDP system includes several modules: robust perception using DINOv2, target fusion through re-labeling, trajectory planning via diffusion models, and precise tracking using a Stanley controller [9][10][14]. Group 3: Experimental Results - The system was tested in the CARLA simulator with 800 expert trajectories across various weather conditions, demonstrating significant improvements in success rates and reduced errors compared to existing methods [20][27]. - The combination of the diffusion model and Stanley controller improved success rates by 16% under severe domain shifts, showcasing the system's robustness in complex environments [27]. Group 4: Future Directions - Future work includes integrating video world models to further bridge the gap between simulation and reality, collecting human demonstration data in 3DGS environments, and deploying the system in real vehicles to validate performance across diverse scenarios [34].
给自动驾驶业内新人的一些建议
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article emphasizes the establishment of a comprehensive community called "Autonomous Driving Heart Knowledge Planet," aimed at bridging the gap between academia and industry in the field of autonomous driving [1][3][14]. Group 1: Community Development - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, providing a platform for technical sharing and communication among beginners and advanced learners [3][14]. - The community offers various resources, including videos, articles, learning paths, Q&A sessions, and job exchange opportunities, making it a holistic hub for autonomous driving enthusiasts [1][3][5]. Group 2: Learning Resources - The community has compiled over 40 technical learning paths, covering topics such as end-to-end learning, multi-modal large models, and data annotation practices, significantly reducing the time needed for research [5][14]. - Members can access a variety of video tutorials and courses tailored for beginners, covering essential topics in autonomous driving technology [9][15]. Group 3: Industry Engagement - The community collaborates with numerous industry leaders and academic experts to discuss trends, technological advancements, and production challenges in autonomous driving [6][10][14]. - There is a mechanism for job referrals within the community, facilitating connections between members and leading companies in the autonomous driving sector [10][12]. Group 4: Technical Focus Areas - The community has organized resources on various technical areas, including 3D object detection, multi-sensor fusion, and high-precision mapping, which are crucial for the development of autonomous driving technologies [27][29][31]. - Specific focus is given to emerging technologies such as visual language models (VLM) and world models, with detailed summaries and resources available for members [37][39][45].
今年CVPR,自动驾驶还能冲什么方向?
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality papers in top conferences like CVPR and ICLR, highlighting the need for strategic efforts in the final stages of the submission process [1][2][4]. Group 1: Submission Guidance - The article mentions that the majority of accepted papers in past conferences focus on localized breakthroughs and verifiable improvements, aligning closely with the main themes of the respective years [1]. - It suggests that the main theme for CVPR 2026 is likely to be "world models," indicating a strategic direction for potential submissions [1]. - The article encourages students to leverage the experiences of predecessors to enhance their submission quality, particularly in the final stages of preparation [2]. Group 2: Mentorship and Support - The organization, "Automated Driving Heart," is described as the largest AI technology media platform in China, with extensive academic resources and a deep understanding of the challenges in interdisciplinary fields like autonomous driving and robotics [3]. - The article highlights the success rate of their mentorship program, with a 96% acceptance rate for students over the past three years, indicating the effectiveness of their guidance [5]. - It outlines the personalized support provided, including assistance with research thinking, familiarization with research processes, and practical application of theoretical models [7][13]. Group 3: Program Structure and Offerings - The article details the structured support offered, including personalized paper guidance, real-time interaction with mentors, and unlimited access to recorded sessions for review [13]. - It specifies that the program caters to various academic levels and goals, from foundational courses for beginners to advanced mentorship for experienced researchers [17][19]. - The organization also provides opportunities for outstanding students, such as recommendations to prestigious institutions and direct referrals to leading tech companies [19].
自动驾驶春秋的终点
自动驾驶之心· 2025-10-28 00:03
Core Insights - The autonomous driving industry is transitioning from a "Spring and Autumn" period to a "Warring States" phase, indicating a shift from competitive acknowledgment to a struggle for dominance, where only leading players will survive [2][3]. Technical Route Dispute - The competition in autonomous driving has evolved from a ranking system to a life-and-death battle, with losers losing access to resources for continuous R&D [3]. - The 2022 Tesla AI Day II has significantly influenced the development direction of autonomous driving technology, leading to a divergence in technical paths among companies [4]. - Companies are exploring differentiated technical routes, with some abandoning LiDAR in favor of pure vision solutions, while others are experimenting with various mapping and planning algorithms [4][5]. Supplier Model Counterattack - As the technology experience reaches a plateau, the gap between leading autonomous driving teams is narrowing, leading to a price war in the automotive industry [6]. - Traditional automakers and smaller brands are increasingly opting for supplier solutions to reduce costs and enhance product capabilities, indicating a trend of "handing over their soul" to survive [6]. Data Barrier as a Key to Reversal - The current plateau in autonomous driving technology is attributed to the immaturity of data-driven solutions, with a heavy reliance on rule-based algorithms [7][9]. - The release of Tesla's FSD V14 highlights the importance of real-world data in enhancing autonomous driving AI, despite advancements in generative AI technologies [7][9].
TeraSim World:用开源方式重建「特斯拉式」世界模型
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - Tesla has showcased its internal World Model, a neural network-driven virtual world generator that synthesizes high-resolution videos from eight camera perspectives based on vehicle states and control inputs, enabling real-time environmental predictions and closed-loop validation [2][6]. Group 1: Tesla's World Model - Tesla's World Model allows for the replay of historical problem scenarios and the injection of new adversarial events in a virtual environment for testing and reinforcement learning [2]. - The model learns a general mapping of "perception-action-world change," making it applicable to other platforms like robotics, thus forming a basis for general physical intelligence [2]. Group 2: TeraSim World Framework - A research team from the University of Michigan, SaferDrive AI, the University of Hong Kong, and Tsinghua University has developed TeraSim World, an open-source framework that achieves similar generation and evaluation capabilities as Tesla's World Model without requiring real maps or sensor backgrounds [5][6]. - TeraSim World is designed to automatically generate city environments and traffic behaviors using AI, creating a fully data-driven, reproducible, and scalable world model platform [5]. Group 3: System Features - TeraSim World features a modular, fully automated data synthesis pipeline for generating realistic and safety-critical data for end-to-end autonomous driving [7]. - The system retrieves real-world road maps and converts them into simulation-ready formats, allowing for the automatic generation of digital maps based on user input [10][11]. - It can simulate realistic traffic conditions by automatically obtaining real-time traffic data, thus reflecting local traffic patterns [13]. Group 4: Agent and Sensor Simulation - The agent simulation component enables virtual vehicles, pedestrians, and cyclists to behave like their real-world counterparts, incorporating human driving characteristics [16]. - TeraSim World introduces safety-critical scenarios based on real-world accident probabilities, ensuring the generated events are both risky and realistic [17]. - The sensor simulation aspect generates realistic camera inputs and can be extended to other sensor types, utilizing NVIDIA's open-source Cosmos models for high-resolution, time-synchronized multi-view video generation [19][22][25]. Group 5: Automated Stress Testing - TeraSim World supports automated full-stack stress testing, generating and validating various risk scenarios to assess the stability and safety boundaries of autonomous driving systems [30]. - The framework can inject dynamic and static risks, such as sudden stops or environmental changes, to evaluate system responses under diverse conditions [30]. Group 6: Conclusion and Future Plans - TeraSim World combines agent and sensor simulation to provide a comprehensive data generation process for training and testing autonomous driving systems without the need for real-world data collection [31]. - The system aims to create a large-scale synthetic driving dataset and expand to multi-modal sensor simulations, establishing an open virtual testing ground for researchers and developers [32].
最新一篇长达76页的Agentic AI综述
自动驾驶之心· 2025-10-28 00:03
Core Insights - The article discusses the evolution of Agentic AI from pipeline-based systems to model-native paradigms, emphasizing the internalization of reasoning, memory, and action capabilities within the models themselves [1][44]. - It highlights the role of reinforcement learning (RL) as a driving force in transforming static models into adaptive, goal-oriented entities capable of learning from interactions with their environment [1][44]. Background - The rapid advancement of generative AI has primarily focused on reactive outputs, lacking long-term reasoning and environmental interaction. The shift towards Agentic AI emphasizes three core capabilities: planning, tool usage, and memory [3]. - Early systems relied on pipeline paradigms where these capabilities were externally orchestrated, leading to passive models that struggled in unexpected scenarios. The new model-native paradigm integrates these capabilities directly into the model parameters, allowing for proactive decision-making [3][6]. Reinforcement Learning for LLMs - The scarcity of programmatic data and vulnerability to out-of-distribution scenarios necessitate the use of result-driven RL to internalize planning and other capabilities, moving away from prompt-induced behaviors [6][7]. - RL offers advantages over supervised fine-tuning (SFT) by enabling dynamic exploration and relative value learning, transforming models from passive imitators to active explorers [8][9]. Unified Paradigm and Algorithm Evolution - Early RLHF methods excelled in single-turn alignment but struggled with long-term, multi-turn, and sparse rewards. Newer result-driven RL methods like GRPO and DAPO enhance training stability and efficiency [12]. - The evolution of algorithms involves leveraging foundational models to provide priors while refining capabilities through interaction and rewards in task environments [12]. Core Capabilities: Planning - The pipeline paradigm views planning as automated reasoning and action sequence search, which is limited in flexibility and stability under complex tasks [14][15]. - The model-native paradigm integrates planning capabilities directly into model parameters, enhancing flexibility and robustness in open environments [15][18]. Core Capabilities: Tool Usage - Early systems embedded models in fixed nodes, lacking flexibility. The model-native transition internalizes decision-making regarding tool usage, forming a multi-objective decision problem [21][22]. - Challenges remain in credit assignment and environmental noise, which can destabilize training. Modular training approaches aim to isolate execution noise and improve sample efficiency [22]. Core Capabilities: Memory - Memory capabilities have evolved from external modules to integral components of task execution, emphasizing action-oriented evidence governance [27][30]. - Short-term memory utilizes techniques like sliding windows and retrieval-augmented generation (RAG), while long-term memory focuses on external libraries and parameter-based internalization [30]. Future Directions - The trajectory of Agentic AI indicates a shift towards deeper integration between models and their environments, moving from systems designed to use intelligence to those that grow intelligence through experience and collaboration [44].