自动驾驶之心

Search documents
ICCV‘25 | 华科提出HERMES:首个统一驾驶世界模型!
自动驾驶之心· 2025-07-25 10:47
Core Viewpoint - The article introduces HERMES, a unified driving world model that integrates 3D scene understanding and future scene generation, significantly reducing generation errors by 32.4% compared to existing methods [4][17]. Group 1: Model Overview - HERMES addresses the fragmentation in existing driving world models by combining scene generation and understanding capabilities [3]. - The model utilizes a BEV (Bird's Eye View) representation to integrate multi-view spatial information and introduces a "world query" mechanism to enhance scene generation with world knowledge [3][4]. Group 2: Challenges and Solutions - The model overcomes the challenge of multi-view spatiality by employing a BEV-based world tokenizer, which compresses multi-view images into BEV features, thus preserving key spatial information while adhering to token length limitations [5]. - To address the integration of understanding and generation, HERMES introduces world queries that enhance the generated scenes with world knowledge, bridging the gap between understanding and generation [8]. Group 3: Performance Metrics - HERMES demonstrates superior performance on the nuScenes and OmniDrive-nuScenes datasets, achieving an 8.0% improvement in the CIDEr metric for understanding tasks and significantly lower Chamfer distances in generation tasks [4][17]. - The model's world query mechanism contributes to a 10% reduction in Chamfer distance for 3-second point cloud predictions, showcasing its effectiveness in enhancing generation performance [20]. Group 4: Experimental Validation - The experiments utilized datasets such as nuScenes, NuInteract, and OmniDrive-nuScenes, employing metrics like METEOR, CIDEr, ROUGE for understanding tasks, and Chamfer distance for generation tasks [19]. - Ablation studies confirm the importance of the interaction between understanding and generation, with the unified framework outperforming separate training methods [18]. Group 5: Qualitative Results - HERMES is capable of accurately generating future point cloud evolutions and understanding complex scenes, although challenges remain in scenarios involving complex turns, occlusions, and nighttime conditions [24].
试了一下Grok 4,感觉学术界的天也要塌了
自动驾驶之心· 2025-07-25 10:47
Core Viewpoint - The article highlights the capabilities of Grok 4, an advanced AI tool that can efficiently assist in academic tasks, such as referencing over 100 scholarly articles in a short time, showcasing its potential to enhance research productivity [1]. Group 1: Technology and Innovation - Grok 4 can accurately cite references from academic papers, providing direct links to each source, which eliminates the need for manual searches [1]. - The tool is noted for its efficiency, completing tasks in a fraction of the time compared to traditional methods, thus representing a significant advancement in AI capabilities [1][9]. Group 2: Accessibility and Usage - Grok 4 is available for a subscription fee of $30 per month, but there are free alternatives that utilize its API, making it accessible to a broader audience [9][12]. - Users can interact with Grok 4 through a web interface, allowing for easy integration into their research workflow [9][10]. Group 3: Community and Learning - The article mentions a community of nearly 4,000 members focused on autonomous driving, which includes over 300 companies and research institutions, indicating a robust network for knowledge sharing and collaboration [14]. - Various learning paths in autonomous driving technology are available, covering topics from perception to mapping and control, which can benefit those entering the field [14][16].
建了个抱团取暖的求职交流群~
自动驾驶之心· 2025-07-25 10:47
微信扫码添加小助理邀请进群,备注自驾+昵称+求职; 最近和很多准备校招的小伙伴接触,发现大家都在焦虑。在校不让实习、实验室没算力、传统规控想转行却不 知如何下手。也有不少工作多年的小伙伴表示也在看机会,感知转大模型、规则想转具身。但其实背后都是大 家想更进一步,都想去争取一个更好的未来~ 大家都感觉到自动驾驶技术栈开始趋同,以前大大小小几十个方向都需要算法工程师,现在one model、 VLM、VLA,统一方案的背后其实是更高的技术壁垒。博主一直在鼓励大家坚持、多多交流,但归根结底个 人的力量是有限的。我们希望共建一个大的社群和大家一起成长,真正能够帮助到一些有需要的小伙伴,成为 一个汇集全行业人才的综合型平台。所以我们也开始正式运营求职与行业相关的社群。社群内部主要讨论相关 产业、公司、产品研发、求职与跳槽相关内容。如果您想结交更多同行业的朋友,第一时间了解产业。欢迎加 入我们! ...
传统的感知被嫌弃,VLA逐渐成为新秀......
自动驾驶之心· 2025-07-25 08:17
Core Insights - The article discusses the advancements in end-to-end autonomous driving algorithms, highlighting the emergence of various models and approaches in recent years, such as PLUTO, UniAD, OccWorld, and DiffusionDrive, which represent different technical directions in the field [1] - It emphasizes the shift in academic focus towards large models and Vision-Language-Action (VLA) methodologies, suggesting that traditional perception and planning tasks are becoming less prominent in top conferences [1] - The article encourages researchers to align their work with large models and VLA, indicating that there are still many subfields to explore despite the challenges for beginners [1] Summary by Sections Section 1: VLA Research Topics - The article introduces VLA research topics aimed at helping students systematically grasp key theoretical knowledge and expand their understanding of the specified direction [6] - It addresses the need for students to combine theoretical models with practical coding skills to develop new models and enhance their research capabilities [6] Section 2: Enrollment Information - The program has a limited enrollment capacity of 6 to 8 students per session [5] - It targets students at various academic levels (bachelor's, master's, and doctoral) who are interested in enhancing their research skills in autonomous driving and AI [7] Section 3: Course Outcomes - Participants will analyze classic and cutting-edge papers, understand key algorithms, and learn about writing and submission methods for academic papers [8][10] - The course includes a structured timeline of 12 weeks of online group research, followed by 2 weeks of paper guidance and a 10-week maintenance period [10] Section 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [13] - It emphasizes high academic standards and aims to equip students with a rich set of outputs, including a paper draft and a project completion certificate [13] Section 5: Technical Requirements - Students are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch [11] - Hardware requirements include access to high-performance machines, preferably with multiple GPUs [11] Section 6: Service and Support - The program includes dedicated supervisors to track student progress and provide assistance with academic and non-academic issues [17] - The course will be conducted via Tencent Meeting and recorded for later access [18]
这几个方向,从自驾转到具身会比较丝滑......
自动驾驶之心· 2025-07-25 06:47
自驾转具身有哪些方向可以做?或者说实现路径有哪些? 我们对具身任务按照本体区分的话大概有几种:机械臂、四足、人形、仿真-only; 更多具身相关内容,欢迎关注我们的具身智能社区,具身智能之心公众号平台。 如果您想做进一步学习,也欢迎加入我们的具身智能之心知识星球。具身智能之心知识星球,作为国内最 大的具身技术社区,一直在给行业和个人输送各类人才、产业学术信息。 目前累积了国内外几乎所有主流 具身公司和大多数知名研究机构。 如果您需要第一时间了解产业、求职和行业痛点,欢迎加入我们。 一个认真做内容的社区,一个培养未来领袖的地方。 我们内部为大家梳理了近30+技术路线,无论你是要找benchmark、还是要找综述和学习入门路线,都能极 大缩短检索时间。星球还为大家邀请了数十个具身领域嘉宾,都是活跃在一线产业界和工业界的大佬(经 常出现的顶会和各类访谈中哦)。欢迎随时提问,他们将会为大家答疑解惑。 除此之外,还为大家准备了很多圆桌论坛、直播,从本体、数据到算法,各类各样,逐步为大家分享具身 行业究竟在发生什么?还有哪些问题! 先说下机械臂,像VA、VLA、diffusion相关模块都可以用到这个载体上,模型端+数据 ...
火热报名中!欢迎参加第三届CCF智能汽车学术年会(CIVS 2025)
自动驾驶之心· 2025-07-24 09:42
Core Viewpoint - The third CCF Intelligent Vehicle Academic Annual Conference (CIVS 2025) will be held from August 16 to 18, 2025, in Hangzhou, focusing on the theme of "Research and Industry Co-Progress, Education and Popular Science Together" [2][3]. Group 1: Conference Overview - The conference aims to gather experts from various sectors, including government, industry, academia, and research, to discuss the high-quality development of intelligent vehicles in China [2]. - Key topics include new sensors for harsh weather, RISC-V chips, quantum secure communication, intelligent car lights, international cooperation under the Belt and Road Initiative, and autonomous driving simulation [2]. - The event will feature over ten technical sub-forums, connecting researchers, industry practitioners, government officials, and the public [2]. Group 2: Educational Initiatives - Four scientific education lectures will be established to help students understand the application of mathematics and physics in autonomous driving and enhance their scientific literacy [2]. - These lectures will cover topics such as the role of chips in intelligent vehicles and automotive design, aimed at students of various educational levels [2]. Group 3: Competitions - The first CCF Intelligent Vehicle Competition (CCF IVC 2025) will be held concurrently, featuring events like automotive safety defense and autonomous driving simulation [3]. Group 4: Sponsorship and Collaboration - The conference offers various sponsorship levels, including Diamond (3 million yuan), Platinum (1 million yuan), Gold (500,000 yuan), and Silver (300,000 yuan), encouraging enterprises to collaborate [8]. - CCF ranks fifth globally and second in China according to the 2022 Global Technology Society Development Index Report [8]. Group 5: Organizational Structure - The conference is guided by a committee of esteemed academicians and industry leaders, ensuring high academic standards and relevance [9]. - The organizing committee includes representatives from various universities and research institutions, promoting collaboration across sectors [9][10].
基于3DGS和Diffusion的自动驾驶闭环仿真论文总结
自动驾驶之心· 2025-07-24 09:42
Core Viewpoint - The article discusses advancements in autonomous driving simulation technology, highlighting the integration of various components such as scene rendering, data collection, and intelligent agents to create realistic driving environments [1][2][3]. Group 1: Simulation Components - The first step involves creating a static environment using 3D Gaussian Splatting and Diffusion Models to build a realistic cityscape, capturing intricate details [1]. - The second step focuses on data collection from panoramic views to extract dynamic assets like vehicles and pedestrians, enhancing the realism of simulations [2]. - The third step emphasizes relighting techniques to ensure that assets appear natural under various lighting conditions, simulating different times of day and weather scenarios [2]. Group 2: Intelligent Agents and Weather Systems - The fourth step introduces intelligent agents that mimic real-world behaviors, allowing for complex interactions within the simulation [3]. - The fifth step incorporates weather systems to enhance the atmospheric realism of the simulation, enabling scenarios like rain or fog [4]. Group 3: Advanced Features - The sixth step includes advanced features that challenge autonomous vehicles with unexpected obstacles, simulating real-world driving complexities [4].
出现断层了?ICCV2025的自动驾驶方向演变...
自动驾驶之心· 2025-07-24 09:42
Core Insights - The article highlights the latest advancements in autonomous driving technologies, focusing on various research papers and frameworks that contribute to the field [2][3]. Multimodal Models & VLA - ORION presents a holistic end-to-end framework for autonomous driving, utilizing vision-language instructed action generation [5]. - An all-in-one large multimodal model for autonomous driving is introduced, showcasing its potential applications [6][7]. - MCAM focuses on multimodal causal analysis for ego-vehicle-level driving video understanding [9]. - AdaDrive and VLDrive emphasize self-adaptive systems and lightweight models for efficient language-grounded autonomous driving [10]. Simulation & Reconstruction - ETA proposes a dual approach to self-driving with large models, enhancing efficiency through forward-thinking [13]. - InvRGB+L introduces inverse rendering techniques for complex scene modeling [14]. - AD-GS and BézierGS focus on object-aware scene reconstruction and dynamic urban scene reconstruction, respectively [18][19]. End-to-End & Trajectory Prediction - Epona presents an autoregressive diffusion world model for autonomous driving, enhancing trajectory prediction capabilities [25]. - World4Drive introduces an intention-aware physical latent world model for end-to-end autonomous driving [30]. - MagicDrive-V2 focuses on high-resolution long video generation for autonomous driving with adaptive control [35]. Occupancy Networks - The article discusses advancements in 3D semantic occupancy prediction, highlighting the transition from binary to semantic data [44]. - GaussRender and GaussianOcc focus on learning 3D occupancy with Gaussian rendering techniques [52][54]. Object Detection - Several papers address 3D object detection, including MambaFusion, which emphasizes height-fidelity dense global fusion for multi-modal detection [64]. - OcRFDet explores object-centric radiance fields for multi-view 3D object detection in autonomous driving [69]. Datasets - The ROADWork Dataset aims to improve recognition and analysis of work zones in driving scenarios [73]. - Research on driver attention prediction and motion planning is also highlighted, showcasing the importance of understanding driver behavior in autonomous systems [74][75].
再见伪影!港大开源GS-SDF:SDF做高斯初始化还能这么稳~
自动驾驶之心· 2025-07-24 06:46
Core Viewpoint - The article presents a unified LiDAR-visual system that addresses geometric inconsistencies in Gaussian splatting for robotic applications, successfully combining Gaussian splatting with Neural Signed Distance Fields (NSDF) to achieve geometrically consistent rendering and reconstruction [52]. Group 1: Unified LiDAR-Visual System - The proposed system aims to utilize registered images and low-cost LiDAR data to reconstruct both the appearance and surface structure of scenes under arbitrary trajectories [5][6]. - The importance of Gaussian initialization in achieving good structure is emphasized, highlighting its role in the optimization process [22]. Group 2: Geometric Regularization - The article discusses the introduction of geometric regularization into the 3D Gaussian Splatting (3DGS) framework to address geometric inconsistencies that manifest as rendering distortions [3][6]. - It suggests that depth cameras and LiDAR can provide direct structural priors, which can be integrated into the 3DGS framework for improved geometric regularization [3]. Group 3: Methodology - The overall process includes three stages: training a Neural Signed Distance Field (NSDF) using point clouds, initializing Gaussian primitives from the NSDF, and optimizing both Gaussian primitives and NSDF through SDF-assisted shape regularization [8][6]. - The use of 2D Gaussian splatting to represent 3D scenes is detailed, with each disk defined by parameters such as center point, orthogonal tangent vectors, scaling factor, opacity, and view-dependent color [10]. Group 4: Experimental Results - The proposed method demonstrates superior reconstruction accuracy and rendering quality across various trajectories, as evidenced by extensive experiments [52]. - Quantitative results indicate that the method outperforms existing techniques in metrics such as C-L1, F-Score, SSIM, and PSNR across multiple datasets [46][49]. Group 5: Limitations and Future Work - The method exhibits limitations in extrapolating new view synthesis capabilities, suggesting a need for further exploration of advanced neural rendering techniques to address this limitation [53].
研一结束了,还什么都不太懂。。。
自动驾驶之心· 2025-07-24 06:46
Core Viewpoint - The article emphasizes the evolving landscape of the autonomous driving industry, highlighting the need for professionals to adapt their skill sets to align with current industry demands, particularly in areas like end-to-end VLA (Vision-Language Action) models and traditional control systems [4][6]. Summary by Sections Industry Trends - The demand for talent in autonomous driving is shifting towards candidates with strong backgrounds and skills in cutting-edge technologies, such as end-to-end VLA models, while traditional control systems still have job opportunities [2][4]. - The article notes that the technology stack in autonomous driving is becoming more standardized, reducing the diversity of recruitment directions compared to previous years [3][4]. Skill Development - Professionals are encouraged to upgrade their technical skills to meet the evolving demands of the industry, with a focus on continuous learning and adaptation [4][6]. - The article suggests that anxiety about job prospects can be mitigated by actively seeking out learning resources and engaging with communities that focus on the latest advancements in autonomous driving technology [4][6]. Learning Resources - The article mentions various learning modules available in the "Autonomous Driving Heart Knowledge Planet," which includes cutting-edge topics such as world models, trajectory prediction, and large models [5][11]. - It highlights the availability of videos and materials for beginners and advanced learners, aimed at helping individuals navigate the complexities of the autonomous driving field [4][5]. Community Engagement - The "Autonomous Driving Heart Knowledge Planet" is described as a significant community for knowledge sharing, featuring nearly 4000 members and over 100 industry experts, providing a platform for discussion and problem-solving [8][11]. - The community focuses on various subfields within autonomous driving, including perception, mapping, planning, and control, offering a comprehensive approach to learning and professional development [11][13].