自动驾驶之心
Search documents
理想ICCV'25分享了世界模型:从数据闭环到训练闭环
自动驾驶之心· 2025-10-30 00:56
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the transition from data closed-loop systems to training closed-loop systems, marking a new phase in autonomous driving development [17][20]. Group 1: Development of Li Auto's VLA Model - Li Auto's VLA driver model has evolved through various stages, from rule-based systems to AI-driven E2E+VLM systems, with a strong emphasis on navigation as a key module [6]. - The end-to-end mass production version of MPI has reached over 220 units, representing a 19-fold increase compared to the version from July 2024 [12]. Group 2: Data Closed-Loop Value - The data closed-loop process includes shadow mode validation, data mining in the cloud, automatic labeling of effective samples, and model training, with a data return time of one minute [9][10]. - Li Auto has accumulated 1.5 billion kilometers of driving data, utilizing over 200 triggers to produce 15-45 second clip data [10]. Group 3: Transition to Training Closed-Loop - The core of the L4 training loop involves VLA, reinforcement learning (RL), and world models (WM), optimizing trajectories through diffusion and reinforcement learning [22]. - Key technologies for closed-loop autonomous driving training include regional simulation, synthetic data, and reinforcement learning [24]. Group 4: Simulation and Generation Techniques - Simulation relies on scene reconstruction, including visual and Lidar reconstruction, while synthetic data generation utilizes multimodal techniques [25]. - Li Auto's recent advancements in reconstruction and generation have led to significant improvements, with multiple top conference papers published in the last two years [26][29][31]. Group 5: Interactive Agents and System Capabilities - The development of interactive agents is highlighted as a critical challenge in the training closed-loop [37]. - System capabilities are enhanced through world models providing simulation environments, diverse scene construction, and accurate feedback from reward models [38]. Group 6: Community and Collaboration - The article mentions the establishment of nearly a hundred technical discussion groups related to various autonomous driving technologies, with a community of around 4,000 members and over 300 companies and research institutions involved [44][45].
传统规划控制不太好找工作了。。。
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The article emphasizes the evolving landscape of autonomous driving, highlighting the shift from traditional planning and control methods to end-to-end approaches, which are increasingly favored in the industry [2][29]. Summary by Sections Course Offerings - The company has designed a specialized course on end-to-end planning and control in autonomous driving, aimed at addressing real-world challenges and enhancing employability [6][12]. - The course will cover essential algorithms and frameworks used in the industry, focusing on practical applications and integration of traditional and modern methods [6][21]. Course Structure - The course consists of six chapters, each focusing on different aspects of planning and control, including foundational algorithms, decision-making frameworks, and handling uncertainty in environments [20][24][29]. - The course will also include interview preparation, resume enhancement, and mock interviews to support participants in securing job offers [31][10]. Target Audience - The course is designed for individuals with a background in vehicle engineering, automation, computer science, and related fields, particularly those seeking to transition into autonomous driving roles [37][39]. - Participants are expected to have a basic understanding of programming and relevant mathematical concepts to fully benefit from the course [43]. Instructor Expertise - The course will be led by an experienced instructor with a strong background in autonomous driving algorithms and practical implementation, ensuring that participants receive high-quality guidance [34][10]. Additional Benefits - Participants will have access to supplementary resources, including code and development environments, to enhance their learning experience [13][15]. - The course aims to provide a comprehensive understanding of the industry, equipping participants with the skills needed to tackle complex problems in autonomous driving [6][13].
繁华落幕,人形机器人或将进入寒冬
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The humanoid robot industry is facing significant challenges and may be entering a period of stagnation due to unmet expectations and technological limitations [4][10][20]. Industry Performance - International companies, such as Tesla, are experiencing setbacks; for instance, Tesla's Gen2 production has been halted due to overheating and durability issues, and Gen3 has been delayed until Q1 of next year [5][20]. - Domestic companies are exhibiting a facade of prosperity, with many orders being mere internal transfers or non-deliverable framework orders [7]. Technological Limitations - Current advancements in AI technology have not yet translated into the expected level of general intelligence in humanoid robots, raising doubts about the industry's future [10][11]. - Specific applications, such as sorting packages and folding clothes, demonstrate limited generalization capabilities, which could lead to failures in more complex environments like homes [14][17]. - Video learning, while touted as a future solution, remains largely in the research phase, with no company successfully demonstrating its practical application for dexterous tasks [19]. Potential Upsides - There are two uncertain factors that could influence the humanoid robot industry's trajectory: 1. The performance of Tesla's Optimus Gen3, which is seen as a potential game-changer; however, if it fails to meet expectations, it could lead to widespread pessimism about the industry [20][22]. 2. The success of companies like Yushun, which have focused on optimizing hardware and exploring entertainment applications, suggesting that even in a downturn, certain segments may continue to thrive [26]. Conclusion - The current state of the humanoid robot industry reflects a period of reevaluation and potential technological advancement, similar to the early challenges faced by the electric vehicle sector [27][28].
IROS'25冠军方案:X-VLA重磅开源,全面刷新机器人SOTA!
自动驾驶之心· 2025-10-30 00:04
Core Viewpoint - The article discusses the launch of the X-VLA model, a groundbreaking open-source model in the field of embodied intelligence, which has achieved significant performance improvements in autonomous tasks such as folding clothes, showcasing its robustness and generalization capabilities [2][5][7]. Group 1: Model Performance and Achievements - X-VLA is the first open-source model to accomplish a 120-minute autonomous clothing folding task without assistance, achieving state-of-the-art (SOTA) performance with only 0.9 billion parameters across five authoritative simulation benchmarks [2][7]. - The X-VLA team won first place in the IROS-AGIBOT World Challenge, competing against 431 teams from 23 countries, demonstrating exceptional performance in real physical tasks such as grasping, folding, cooking, and pouring [4][5]. Group 2: Technical Innovations - The model employs a Soft-Prompt mechanism to enhance adaptability across different robotic platforms, allowing for improved stability and efficiency in training with heterogeneous data [16]. - A multi-modal encoding strategy is introduced to handle diverse visual inputs, optimizing resource allocation while maintaining information integrity [16]. - The use of flow-matching in the action decoder enhances the smoothness and robustness of action trajectories, crucial for executing long-sequence tasks [17]. Group 3: Data and Training Strategies - X-VLA utilizes a balanced data sampling strategy to ensure equitable training across heterogeneous datasets, preventing model bias [21]. - A rigorous data cleaning and temporal alignment pipeline is implemented to enhance the quality and consistency of the state-action sequences [21]. - The model's training process includes a customized post-training workflow that allows for efficient adaptation to specific tasks using smaller datasets [23][26]. Group 4: Experimental Results - In various authoritative simulation environments, X-VLA achieved SOTA performance, significantly outperforming existing models [24]. - The model demonstrated strong performance in real-world robotic tasks, successfully completing complex long-duration tasks like autonomous clothing folding [27].
地平线HSD的确值得留意
自动驾驶之心· 2025-10-29 03:30
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the performance of Horizon's HSD system compared to Li Auto's VLA system, highlighting the strengths and weaknesses of both [5][6]. Group 1: Technology Comparison - Horizon's HSD technology architecture utilizes visual information for trajectory output, with laser radar positioning as a safety redundancy, while the VLA system is criticized for its high computational and bandwidth requirements [5]. - During a test drive of the Horizon HSD engineering vehicle, the experience was reported to be significantly better than the current production version of Li Auto's VLA, particularly in terms of comfort and smoothness during traffic conditions [6]. - Feedback from the Horizon team indicated that the HSD system performs well in controlled environments but has limitations in extreme weather and complex scenarios, suggesting a need for further development [7]. Group 2: Community and Collaboration - The article mentions the establishment of nearly a hundred technical discussion groups related to various aspects of autonomous driving, with a community of around 4,000 members and over 300 companies and research institutions involved [8]. - The collaboration between Horizon and vehicle manufacturers is emphasized, with a focus on integrating user interface elements that respect manufacturer preferences, which can impact the overall driving experience [7]. Group 3: Future Outlook - The article suggests that while the HSD system shows promise, it is still in development and may not yet reach full autonomous driving capabilities, estimating it to be around 60% of the level of Li Auto's V13 system [7].
ICCV 2025「端到端自动驾驶」冠军方案分享!
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article highlights the victory of Inspur's AI team in the Autonomous Grand Challenge 2025, where they achieved a score of 53.06 in the end-to-end autonomous driving track using their innovative framework "SimpleVSF" [2][7][13] - The framework integrates bird's-eye view perception trajectory prediction with a vision-language multimodal model, enhancing decision-making capabilities in complex traffic scenarios [2][5][8] Summary by Sections Competition Overview - The ICCV 2025 Autonomous Driving Challenge is a significant international event focusing on autonomous driving and embodied intelligence, featuring three main tracks [4] - The end-to-end driving challenge evaluates trajectory prediction and behavior planning using a data-driven simulation framework, emphasizing safety and efficiency across nine key metrics [4] Technical Challenges - End-to-end autonomous driving aims to reduce errors and information loss from traditional modular approaches, yet struggles with decision-making in complex real-world scenarios [5] - Current methods can identify basic elements but fail to understand higher-level semantics and situational awareness, leading to suboptimal decisions [5] Innovations in SimpleVSF Framework - The SimpleVSF framework bridges the gap between traditional trajectory planning and semantic understanding through a vision-language model (VLM) [7][8] - The VLM-enhanced scoring mechanism improves decision quality and scene adaptability, resulting in a 2% performance increase for single models and up to 6% in fusion decision-making [8][11] Decision-Making Mechanism - The dual fusion decision mechanism combines quantitative and qualitative assessments, ensuring optimal trajectory selection based on both numerical and semantic criteria [10][11] - The framework employs advanced models for generating diverse candidate trajectories and extracting robust environmental features, enhancing overall system performance [13] Achievements and Future Directions - The SimpleVSF framework's success in the challenge sets a new benchmark for end-to-end autonomous driving technology, supporting further advancements in the field [13] - Inspur's AI team aims to leverage their algorithmic and computational strengths to drive innovation in autonomous driving technology [13]
Dream4Drive:一个能够提升下游感知性能的世界模型生成框架
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses the development of Dream4Drive, a new synthetic data generation framework aimed at enhancing downstream perception tasks in autonomous driving, emphasizing the importance of high-quality, controllable multimodal video generation [1][2][5]. Group 1: Background and Motivation - 3D perception tasks like object detection and tracking are critical for decision-making in autonomous driving, but their performance heavily relies on large-scale, manually annotated datasets [4]. - Existing methods for synthetic data generation often overlook the evaluation of downstream perception tasks, leading to a misrepresentation of the effectiveness of synthetic data [5][6]. - The need for diverse and extreme scenario data is highlighted, as current data collection methods are time-consuming and labor-intensive [4]. Group 2: Dream4Drive Framework - Dream4Drive decomposes input videos into multiple 3D-aware guidance maps, rendering 3D assets onto these maps to generate edited, multi-view realistic videos for training perception models [1][9]. - The framework utilizes a large-scale 3D asset dataset, DriveObj3D, which includes typical categories from driving scenarios, supporting diverse 3D perception video editing [2][9]. - Experiments show that Dream4Drive can significantly enhance perception model performance with only 420 synthetic samples, which is less than 2% of the real sample size [6][27]. Group 3: Experimental Results - The article presents comparative results demonstrating that Dream4Drive outperforms existing models in various training epochs, achieving higher mean Average Precision (mAP) and nuScenes Detection Score (NDS) [27][28]. - High-resolution synthetic data (512×768) leads to significant performance improvements, with mAP increasing by 4.6 percentage points (12.7%) and NDS by 4.1 percentage points (8.6%) [29][30]. - The findings indicate that the position of inserted assets affects performance, with distant insertions generally yielding better results due to reduced occlusion issues [37][38]. Group 4: Conclusions and Implications - The study concludes that existing evaluations of synthetic data in autonomous driving are biased, and Dream4Drive provides a more effective approach for generating high-quality synthetic data for perception tasks [40][42]. - The results emphasize the importance of using assets that match the style of the dataset to minimize the domain gap between synthetic and real data, enhancing model training [42].
RL训练中,为什么熵减往往意味着训练收敛?
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses the relationship between entropy reduction and convergence in reinforcement learning (RL) training, particularly focusing on softmax policies and their implications for strategy optimization [3][4]. Group 1: Entropy and Strategy Convergence - Entropy converging to zero indicates that the strategy is polarizing towards a deterministic solution, making it difficult to escape local optima, which is a key aspect of convergence [3][4]. - The first theoretical result indicates that for softmax strategies, the expected gradient norm of the strategy at state s is directly related to the Renyi-2 entropy, suggesting that as entropy approaches zero, the expected gradient norm also approaches zero [6][7]. - The second theoretical result shows that as entropy decreases, the upper bound on the movement of the strategy in terms of Reverse KL divergence also decreases, indicating a tighter convergence of strategies [8][16]. Group 2: Implications of Softmax Parameterization - The unique curvature properties of softmax parameterization lead to a decline in learning efficiency as entropy decreases, which can trap the model in local optima [17]. - The article suggests that alternative parameterizations, such as Newton's method or Hadamard parameterization, may help overcome the limitations imposed by softmax parameterization in RL training [17].
博世Dino-Diffusion:端到端泊车无惧天气影响,解决跨域鸿沟
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article discusses advancements in autonomous parking systems, particularly focusing on a modular approach that combines visual foundation models and diffusion models to enhance cross-domain generalization capabilities [8][33]. Group 1: Autonomous Driving Technology - Autonomous driving technology has rapidly developed, with nearly 60% of new cars globally equipped with some form of autonomous driving features [6]. - Parking-related accidents account for 20% of all vehicle accidents in the U.S., with 91% occurring during reverse maneuvers, highlighting the need for precise perception, planning, and control [6]. Group 2: Proposed System and Methodology - The proposed Dino-Diffusion Parking (DDP) system integrates a robust perception module based on the DINOv2 model and a diffusion model for trajectory planning, enhancing the system's ability to generalize across different domains [8][9]. - The DDP system includes several modules: robust perception using DINOv2, target fusion through re-labeling, trajectory planning via diffusion models, and precise tracking using a Stanley controller [9][10][14]. Group 3: Experimental Results - The system was tested in the CARLA simulator with 800 expert trajectories across various weather conditions, demonstrating significant improvements in success rates and reduced errors compared to existing methods [20][27]. - The combination of the diffusion model and Stanley controller improved success rates by 16% under severe domain shifts, showcasing the system's robustness in complex environments [27]. Group 4: Future Directions - Future work includes integrating video world models to further bridge the gap between simulation and reality, collecting human demonstration data in 3DGS environments, and deploying the system in real vehicles to validate performance across diverse scenarios [34].
给自动驾驶业内新人的一些建议
自动驾驶之心· 2025-10-29 00:04
Core Insights - The article emphasizes the establishment of a comprehensive community called "Autonomous Driving Heart Knowledge Planet," aimed at bridging the gap between academia and industry in the field of autonomous driving [1][3][14]. Group 1: Community Development - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, providing a platform for technical sharing and communication among beginners and advanced learners [3][14]. - The community offers various resources, including videos, articles, learning paths, Q&A sessions, and job exchange opportunities, making it a holistic hub for autonomous driving enthusiasts [1][3][5]. Group 2: Learning Resources - The community has compiled over 40 technical learning paths, covering topics such as end-to-end learning, multi-modal large models, and data annotation practices, significantly reducing the time needed for research [5][14]. - Members can access a variety of video tutorials and courses tailored for beginners, covering essential topics in autonomous driving technology [9][15]. Group 3: Industry Engagement - The community collaborates with numerous industry leaders and academic experts to discuss trends, technological advancements, and production challenges in autonomous driving [6][10][14]. - There is a mechanism for job referrals within the community, facilitating connections between members and leading companies in the autonomous driving sector [10][12]. Group 4: Technical Focus Areas - The community has organized resources on various technical areas, including 3D object detection, multi-sensor fusion, and high-precision mapping, which are crucial for the development of autonomous driving technologies [27][29][31]. - Specific focus is given to emerging technologies such as visual language models (VLM) and world models, with detailed summaries and resources available for members [37][39][45].