自动驾驶之心
Search documents
将3DGS嵌入Diffusion - 高速高分辨3D生成框架(ICCV'25)
自动驾驶之心· 2025-11-01 16:04
Core Viewpoint - The article introduces a novel pixel-level 3D diffusion model called DiffusionGS for the Image-to-3D generation task, which maintains 3D view consistency and can be applied to both object-centric and larger-scale scene-level generation [2][17]. Group 1: Methodology - DiffusionGS predicts a 3D Gaussian point cloud at each timestep to ensure consistency in generated views, enhancing the quality of both object and scene generation [2][30]. - The model operates in pixel space rather than latent space, allowing for better preservation of 3D representations and higher spatial resolution [26][30]. - A scene-object mixed training strategy is proposed to generalize 3D priors from various datasets, improving the model's performance [32][34]. Group 2: Performance Metrics - DiffusionGS achieves a PSNR of 25.89 and an SSIM of 0.8880, outperforming current state-of-the-art methods by 2.20 dB in PSNR and 23.25 in FID scores [40]. - The model generates images in 6 seconds for 256x256 resolution and 24 seconds for 512x512 resolution, which is 7.5 times faster than Hunyuan-v2.5 [16][40]. - The method demonstrates superior clarity and 3D consistency in generated images, with fewer artifacts and blurriness compared to existing techniques [44]. Group 3: Technical Contributions - The introduction of the Reference-Point Plucker Coordinate (RPPC) enhances spatial perception by incorporating camera pose information into the model [32][37]. - The model's architecture includes two different MLPs for Gaussian primitives decoding, tailored for object-level and scene-level generation [39]. - A point distribution loss is designed to improve object-level training, ensuring better convergence and performance [39].
理想宣布召回24款11411辆MEGA:免费更换电池,但事故电池爆燃并非电芯问题
自动驾驶之心· 2025-10-31 16:03
>>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 刚刚,理想宣布了对MEGA的召回。 涉及 11411辆 ,理想汽车宣布为车主免费更换全新动力电池及相关配套设备。 另外,理想对之前10月23日晚MEGA的起火事故,也进一步做出了回应,并且明确表达了歉意。 理想宣布11411辆MEGA召回 来源 | 智能车参考 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 召回公告显示,理想本次召回MEGA的生产日期为 2024年2月18日到2024年12月27日 ,共计11411辆 2024 款理想MEGA,为车主免费更换全新动力电池及相关配套 设备。 公告解释," 该批次冷却液的防腐性能不足 ,特定条件下会导致冷却回路动力电池和前电机控制器的冷却铝板腐蚀渗漏,导致车辆出现故障灯点亮、动力受限及 无法上电的情形, 极端情况下会造成动力热失控 ,存在安全隐患"。 理想表示 为召回范围内的车辆免费更换冷却液、动力电池和前电机控制器。 同时,公告还给出了应急处置措施,包括针对冷却液渗漏导致的动力电池安全隐患, 理想汽车云端预警程序会提前 ...
自动驾驶之心平台双十一活动开始了:课程八折、星球七折
自动驾驶之心· 2025-10-31 16:03
Core Viewpoint - The article highlights various promotional activities and discounts related to autonomous driving courses and community memberships during the Double Eleven shopping festival, aiming to attract new learners and retain existing members [2][4][5]. Group 1: Course Discounts - Single course discounts are offered at 20% off, excluding specific classes like regulation control and trajectory prediction [4][5]. - A super discount card is available, providing a 30% discount on courses for one year, excluding certain classes [7]. - The knowledge community offers a 30% discount for new members and a 50% discount for renewals, with limited availability [9]. Group 2: Course Offerings - The autonomous driving series includes seven premium courses covering topics such as world models, trajectory prediction, and 3D detection [12]. - The platform features nearly 40 learning routes, emphasizing continuous learning and engagement with industry experts [12]. - The community facilitates direct interactions with top professionals in the field, enhancing knowledge sharing and networking opportunities [12].
摇人!寻找散落在各地的自动驾驶热爱者(产品经理/4D标注等)
自动驾驶之心· 2025-10-31 16:03
Core Viewpoint - The article emphasizes the need for collaboration in the autonomous driving sector, inviting professionals to engage in training, course development, and research support to drive industry progress [2]. Group 1: Collaboration and Opportunities - The company is seeking partnerships with professionals in the autonomous driving field to enhance training and job guidance initiatives [2]. - High compensation and abundant industry resources will be provided to collaborators [3]. - The main focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Training and Development - The positions are primarily aimed at B2B training for enterprises, universities, and research institutions, as well as C2C training for students and job seekers [5]. - Interested parties are encouraged to reach out for further consultation via WeChat [6].
Feed-Forward 3D综述:3D视觉进入“一步到位”时代
自动驾驶之心· 2025-10-31 16:03
Core Insights - The article discusses the evolution of 3D vision technologies, highlighting the transition from traditional methods like Structure-from-Motion (SfM) to advanced techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the emergence of Feed-Forward 3D as a new paradigm in the AI-driven era [2][6]. Summary by Categories 1. Technological Evolution - The article outlines the historical progression in 3D vision, noting that previous methods often required per-scene optimization, which was slow and lacked generalization capabilities [2][6]. - Feed-Forward 3D is introduced as a new paradigm that aims to overcome these limitations, enabling faster and more generalized 3D understanding [2]. 2. Classification of Feed-Forward 3D Methods - The article categorizes Feed-Forward 3D methods into five main architectures, each contributing to significant advancements in the field: 1. **NeRF-based Models**: These models utilize a differentiable framework for volume rendering but face efficiency issues due to scene-specific optimization. Conditional NeRF approaches have emerged to allow direct prediction of radiance fields [8]. 2. **PointMap Models**: Led by DUSt3R, these models predict pixel-aligned 3D point clouds directly within a Transformer framework, eliminating the need for camera pose input [10]. 3. **3D Gaussian Splatting (3DGS)**: This innovative representation uses Gaussian point clouds to balance rendering quality and speed, with advancements allowing direct output of Gaussian parameters [11][13]. 4. **Mesh / Occupancy / SDF Models**: These methods combine traditional geometric modeling with modern techniques like Transformers and Diffusion models [14]. 5. **3D-Free Models**: These models learn mappings from multi-view inputs to new perspectives without relying on explicit 3D representations [15]. 3. Applications and Tasks - The article highlights diverse applications of Feed-Forward models, including: - Pose-Free Reconstruction & View Synthesis - Dynamic 4D Reconstruction & Video Diffusion - SLAM and visual localization - 3D-aware image and video generation - Digital human modeling - Robotic manipulation and world modeling [19]. 4. Benchmarking and Evaluation Metrics - The article mentions the inclusion of over 30 commonly used 3D datasets, covering various types of scenes and modalities, and summarizes standard evaluation metrics such as PSNR, SSIM, and Chamfer Distance for future model comparisons [20][21]. 5. Future Challenges and Trends - The article identifies four major open questions for future research, including the need for multi-modal data, improvements in reconstruction accuracy, challenges in free-viewpoint rendering, and the limitations of long-context reasoning in processing extensive frame sequences [25][26].
世界模型和VLA正在逐渐走向融合统一
自动驾驶之心· 2025-10-31 00:06
Core Viewpoint - The integration of Vision-Language Action (VLA) and World Model (WM) technologies is becoming increasingly evident, suggesting a trend towards unification rather than opposition in the field of autonomous driving [3][5][7]. Technology Development Trends - Recent discussions highlight that VLA and WM should not be seen as mutually exclusive but rather as complementary technologies that can enhance the development of General Artificial Intelligence (AGI) [3]. - The combination of VLA and WM is supported by various academic explorations, including models like DriveVLA-W0, which demonstrate the feasibility of their integration [3]. Industry Insights - The ongoing debate within the industry regarding VLA and WA (World Action) is more about different promotional narratives rather than fundamental technological differences [7]. - Tesla's recent presentations at ICCV are expected to influence domestic perspectives on the integration of VLA and WA [7]. Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community has been established to provide a comprehensive platform for learning and sharing knowledge in the autonomous driving sector, with over 4000 members and plans to expand to nearly 10,000 [10][23]. - The community offers a variety of resources, including video content, learning routes, and Q&A sessions, aimed at both beginners and advanced practitioners in the field [10][12][28]. Technical Learning Paths - The community has compiled over 40 technical learning routes covering various aspects of autonomous driving, including perception, simulation, planning, and control [24][44]. - Specific learning paths are available for newcomers, including full-stack courses suitable for those with no prior experience [20][17]. Networking and Career Opportunities - The community facilitates connections between members and industry leaders, providing job referral mechanisms and insights into career opportunities within the autonomous driving sector [19][10]. - Members can engage in discussions about research directions, job choices, and industry trends, fostering a collaborative environment for knowledge exchange [97][101].
ICCV 2025 | 高德SeqGrowGraph:一种车道图增量式生成新范式
自动驾驶之心· 2025-10-31 00:06
Core Insights - The article presents SeqGrowGraph, an innovative framework for lane graph autoregressive modeling, which addresses the challenges of constructing high-precision lane maps for autonomous driving systems [18] Group 1: Background and Motivation - The construction of local high-precision maps (online mapping) has become a hot topic in the industry, with lane graph generation being a critical component [2] - Current mainstream technical routes for lane graph generation can be categorized into detection-based and generation-based methods [2] Group 2: Methodology - SeqGrowGraph defines the lane graph as a directed graph G=(V, E), where V represents intersections or key topological nodes, and E represents the lane centerlines connecting the nodes [6] - The core method involves a chain of graph expansions, where the graph construction is completed incrementally by introducing new nodes and updating adjacency and geometry matrices [8][10] - The model architecture follows a mainstream Encoder-Decoder structure, utilizing a BEV encoder to extract features and a Transformer decoder for autoregressive sequence generation [10][11] Group 3: Experimental Validation - SeqGrowGraph was comprehensively evaluated on large-scale autonomous driving datasets nuScenes and Argoverse 2, demonstrating superior performance compared to leading methods in the field [13][14] - Quantitative analysis showed that SeqGrowGraph achieved state-of-the-art performance in topology accuracy metrics such as Landmark and Reachability on both standard and challenging dataset partitions [14][15] Group 4: Qualitative Analysis - Visual results highlighted the advantages of SeqGrowGraph, showcasing its ability to generate topologically continuous, structurally complete, and geometrically accurate lane graphs, while effectively merging redundant nodes from real-world map data [16] Group 5: Conclusion - The SeqGrowGraph framework not only aligns more closely with human structured reasoning but also effectively overcomes inherent limitations of existing methods in handling complex topologies, such as loops [18]
RAD:通过3DGS结合强化学习的端到端自动驾驶
自动驾驶之心· 2025-10-31 00:06
Core Insights - The paper addresses challenges in deploying end-to-end autonomous driving (AD) algorithms in real-world scenarios, focusing on causal confusion and the open-loop gap [1][2] - It proposes a closed-loop reinforcement learning (RL) training paradigm based on 3D Gaussian Splatting (3DGS) technology to enhance the robustness of AD strategies [2][8] Summary by Sections Problem Statement - The paper identifies two main issues: causal confusion, where imitation learning (IL) captures correlations rather than causal relationships, and the open-loop gap, where IL strategies trained in an open-loop manner perform poorly in real-world closed-loop scenarios [1][2][6] Related Research - The paper references various fields related to the study, including dynamic scene reconstruction, end-to-end autonomous driving, and reinforcement learning, highlighting existing methods and their limitations [3][4][5][7] Proposed Solution - The proposed RAD framework integrates 3DGS technology with RL and IL, employing a three-stage training paradigm: perception pre-training, planning pre-training, and reinforced post-training [8][24] - It includes a specially designed safety-related reward function to guide the AD strategy in handling safety-critical events [11][24] Experimental Validation - The paper details extensive experiments, including data collection of 2000 hours of human expert driving demonstrations and the creation of 4305 high-collision-risk traffic clips for training and evaluation [15][24] - Nine key performance indicators (KPIs) are used to assess the AD strategy, including dynamic collision ratio (DCR) and static collision ratio (SCR) [12][15][24] Key Findings - The RAD framework outperforms existing IL methods, achieving a threefold reduction in collision rates (CR) and demonstrating superior performance in complex dynamic environments [9][12][24] - The optimal RL-IL ratio of 4:1 was found to balance safety and trajectory consistency effectively [12][15] Future Directions - The paper suggests further exploration in areas such as enhancing the interactivity of the 3DGS environment, improving rendering techniques, and expanding the application of RL [17][21][22][29]
哈工大最新一篇长达33页的工业智能体综述
自动驾驶之心· 2025-10-31 00:06
Core Insights - The article discusses the rapid evolution of Large Language Models (LLMs) into Industrial Agents, emphasizing their application in high-risk industries such as finance, healthcare, and manufacturing, and the challenges of transforming their potential into practical productivity [2][4]. Group 1: Key Technologies - Industrial agents require a "cognitive loop" for real-world interaction, relying on three core technologies: Memory, Planning, and Tool Use, which together enhance their decision-making and collaborative capabilities [5][18]. - Memory mechanisms evolve through five stages, from simple working memory to collective knowledge bases, enabling long-term task coherence and collaborative learning among agents [11][12]. - Planning capabilities progress from linear task execution to autonomous goal generation, reflecting the depth of decision-making in complex problem-solving [15][16]. - Tool usage evolves from passive invocation to active creation, allowing agents to design new tools to address capability gaps [18][19]. Group 2: Capability Maturity Framework - The article introduces a five-level capability maturity framework for industrial agents, defining their core abilities and application boundaries at each level, from basic process execution to adaptive social systems [18][20]. - Level 1 focuses on process execution systems that translate instructions, while Level 5 represents adaptive social systems capable of autonomous goal generation and environmental collaboration [18][20]. Group 3: Evaluation of Industrial Agents - Evaluating industrial agents involves two main dimensions: foundational capability verification and industry practice adaptation, with standardized benchmarks established for memory, planning, and tool usage [20][23]. - The evaluation framework includes various tests for memory accuracy, planning decision-making, and tool usage efficiency, ensuring agents meet industry-specific requirements [23][24]. Group 4: Application Areas - Industrial agents demonstrate significant potential across various sectors, enhancing efficiency and reducing risks by automating complex tasks and standardizing processes [25][26]. - In software development, agents can manage the entire process from requirement analysis to deployment, while in scientific research, they assist in data analysis and autonomous exploration [26][27]. - The healthcare sector benefits from agents that support diagnostic reasoning and treatment planning, ensuring safety and reliability in high-stakes environments [25][26]. Group 5: Challenges and Future Directions - Despite advancements, industrial agents face challenges in technology, evaluation, and organizational integration, requiring breakthroughs in several areas to achieve widespread adoption [31][34]. - Future trends include enhancing the integration of generative and predictive modeling, improving real-time capabilities, and addressing ethical concerns related to autonomous decision-making [31][34].
禾赛科技和图达通的专利大战
自动驾驶之心· 2025-10-30 03:31
Core Viewpoint - Hesai Technology has officially filed a lawsuit against Tudatong for patent infringement related to the newly showcased Lingque E1X at CES 2025, which bears a striking resemblance to Hesai's AT series products [3][4]. Group 1: Patent Infringement Case - The lawsuit involves similarities in appearance and interface between Tudatong's Lingque E1X and Hesai's ATX, as well as the adoption of the same "905nm wavelength + one-dimensional scanning" technology [3][4]. - Hesai has reported that several of its North American employees have joined Tudatong, including a senior director [3]. - The case arises as Tudatong transitions from a focus on 1550nm technology to a dual strategy involving both 1550nm and 905nm products, which has led to a critical phase for its IPO [4]. Group 2: Market Dynamics and Competition - The laser radar industry has seen intense price competition, particularly affecting new entrants, which is detrimental to the industry's overall development [5]. - Hesai's ATX, launched in April 2024, has secured partnerships with over ten leading domestic automakers and has commenced large-scale production [5]. - Hesai has achieved a significant milestone by producing its one-millionth laser radar unit by the end of September 2025, becoming the first company to reach this annual production volume [5].