Workflow
自动驾驶之心
icon
Search documents
一文尽览!扩散模型在自动驾驶基础模型中的应用汇总,30+工作都在这里了~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the significant role of diffusion models in the development of autonomous driving technologies, highlighting their ability to enhance data diversity, improve perception system robustness, and assist decision-making under uncertainty [2][3]. Group 1: Diffusion Models in Autonomous Driving - Diffusion models have shown promising applications in autonomous driving, particularly in generating diverse and physically constrained results from complex data distributions [2]. - The introduction of the Dual-Conditioned Temporal Diffusion Model (DcTDM) allows for the generation of realistic long-duration driving videos, addressing challenges such as limited data quality and high costs [3][4]. - The performance of DcTDM has been evaluated, demonstrating over 25% improvement in consistency and frame quality compared to other video diffusion models [3]. Group 2: Applications in Perception and Decision-Making - In perception, diffusion models significantly outperform traditional methods in 3D occupancy prediction, especially in occluded or low-visibility areas, thereby supporting downstream planning tasks [4]. - The Stable Diffusion Model effectively predicts vehicle trajectories, enhancing the predictive capabilities of autonomous driving systems [4]. - The DiffusionDrive framework utilizes diffusion models to model multimodal action distributions, innovating end-to-end autonomous driving applications by addressing uncertainties in driving decisions [4]. Group 3: Data Generation and Quality Improvement - Diffusion models are crucial for generating high-quality synthetic data, addressing the challenges of insufficient diversity and authenticity in natural driving datasets [4]. - The introduction of controllable generation techniques is particularly important for overcoming 3D data annotation challenges, with future explorations into video generation aimed at further enhancing data quality [4]. Group 4: Advanced Frameworks and Innovations - LD-Scene combines large language models with latent diffusion models to generate adversarial driving scenarios, enhancing the controllability and robustness of generated scenes [9]. - DualDiff introduces a dual-branch diffusion model designed to improve multi-view driving scene generation, utilizing occupancy ray sampling for rich semantic information [30]. - DiVE employs a diffusion transformer framework to generate high-fidelity, temporally coherent multi-view videos, achieving state-of-the-art performance in multi-view video generation [19][20]. Group 5: Safety and Critical Scenario Generation - AVD2 enhances understanding of accident scenarios by generating videos aligned with detailed natural language descriptions, contributing to accident analysis and prevention [36]. - AdvDiffuser generates adversarial safety-critical driving scenarios, improving transferability across different systems while maintaining authenticity and diversity [68][69]. - The introduction of Causal Composition Diffusion Model (CCDiff) enhances controllability and realism in generating closed-loop traffic scenarios, significantly outperforming existing methods [41].
高保真实景还原!最强性价比3D激光扫描仪~
自动驾驶之心· 2025-07-31 23:33
Core Viewpoint - GeoScan S1 is presented as the most cost-effective handheld 3D laser scanner in China, featuring lightweight design, easy one-button operation, and high efficiency in 3D scene reconstruction with centimeter-level accuracy [1][4]. Group 1: Product Features - The GeoScan S1 can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][24]. - It integrates multiple sensors and supports cross-platform integration, providing flexibility for scientific research and development [1][39]. - The device is equipped with a handheld Ubuntu system and various sensor devices, allowing for easy power supply and operation [1][4]. Group 2: Performance and Specifications - The system supports real-time 3D point cloud mapping, color fusion, and real-time preview, with relative accuracy better than 3 cm and absolute accuracy better than 5 cm [17]. - The device dimensions are 14.2 cm x 9.5 cm x 45 cm, weighing 1.3 kg without the battery and 1.9 kg with the battery, with a battery capacity of 88.8 Wh providing approximately 3 to 4 hours of operation [17][18]. - It features a microsecond-level synchronization for multi-sensor data, ensuring high precision in complex indoor and outdoor environments [29][30]. Group 3: Market Position and Pricing - The initial launch price for the GeoScan S1 starts at 19,800 yuan, with various versions available to meet different user needs, including basic, depth camera, and 3DGS versions [4][53]. - The product is positioned as offering the best price-performance ratio in the industry, integrating multiple sensors and advanced technology [2][53]. Group 4: Applications and Use Cases - GeoScan S1 is suitable for various applications, including urban planning, construction monitoring, and environmental surveying, capable of accurately constructing 3D scene maps in diverse settings such as office buildings, industrial parks, and tunnels [33][42]. - The device supports high-fidelity real-world restoration through an optional 3D Gaussian data collection module, allowing for complete digital replication of real-world environments [46].
从今年的WAIC25看具身智能的发展方向!
自动驾驶之心· 2025-07-31 10:00
Core Viewpoint - The article highlights the development direction of embodied intelligence showcased at the World Artificial Intelligence Conference (WAIC) 2025, emphasizing the increasing diversity of products and companies in the field, particularly in embodied intelligence and autonomous driving [1]. Summary by Sections Embodied Intelligence Showcase - The event featured a significant number of companies and diverse product forms related to embodied intelligence, with a notable demonstration of a robot named "Iron Fist King" showcasing agility and stability [1]. - Many service and industrial robots were displayed, indicating a growing trend in mobile operations, although challenges in cognitive recognition under human intervention were noted [3]. Technological Advancements - Companies are transitioning from merely showcasing demos to establishing industrial closed-loop models, indicating progress in commercializing embodied intelligence technologies [8]. - A focus on integrating data, strategy, and system deployment into a cohesive process is emerging, with many companies now prioritizing a unified model approach [8]. Community and Knowledge Sharing - The article introduces the "Embodied Intelligence Heart" knowledge community, which aims to facilitate technical exchanges among nearly 200 companies and institutions in the field [10]. - The community offers resources such as technical routes, project solutions, and access to industry experts, enhancing learning and collaboration opportunities [10][21]. Job Opportunities and Industry Insights - The community provides job sharing and recruitment opportunities, connecting members with potential employers in the embodied intelligence sector [20][25]. - It also compiles various resources, including research reports, open-source projects, and datasets relevant to embodied intelligence, aiding members in their professional development [30][41].
科研论文这件小事,总是开窍后已太晚......
自动驾驶之心· 2025-07-31 10:00
还在等导师"喂饭"?还在想"基础打好再发"?醒醒!科研开窍要趁早,拒信和延毕不会等你准备 好! "追求完美"型: 总想"学完所有知识"、"打好完美基础"、"做出惊天成果"再开始写。结果?基础 永远学不完,实验永远不完美。 "畏难拖延"型: 一想到读文献、调模型、写论文、被拒稿就头大,下意识逃避,用课程、项目甚 至游戏来麻痹自己。 "低估周期"型: 天真地以为写论文、投稿、修改、接收是几个月就能搞定的事情。殊不知,从idea 到接收,动辄半年到一年甚至更久!审稿被拒?周期再加倍! 科研"开窍"的核心是什么? 看到"延毕"两个字,是不是心里一紧?每年,都有不少才华横溢的硕士,明明能力不差,却卡在 了"论文"这道坎上。不是不努力,而是"开窍"太晚。 "开窍"晚的典型画像 "等导师安排"型: 总觉得导师没给明确方向/任务,自己就无从下手。被动等待,时间悄然流逝。 核心就四个字: 尽早行动! 把"发论文"当成 贯穿硕士生涯 的核心目标,而非最后冲刺的任务。 算一笔"时间账": 研一暑假开始投入:你有近2年时间打磨1-2篇高质量论文(含投稿周期),游刃有余。 研二下才开始着急:留给你的有效时间可能不足1年,还要面临课程、 ...
4000人了,死磕技术的自动驾驶黄埔军校到底做了哪些事情?
自动驾驶之心· 2025-07-31 06:19
Core Viewpoint - The article emphasizes the importance of creating an engaging learning environment in the field of autonomous driving and AI, aiming to bridge the gap between industry and academia while providing valuable resources for students and professionals [1]. Group 1: Community and Resources - The community has established a closed loop across various fields including industry, academia, job seeking, and Q&A exchanges, focusing on what type of community is needed [1][2]. - The platform offers cutting-edge academic content, industry roundtables, open-source code solutions, and timely job information, streamlining the search for resources [2][3]. - A comprehensive technical roadmap with over 40 technical routes has been organized, catering to various interests from consulting applications to the latest VLA benchmarks [2][14]. Group 2: Educational Content - The community provides a series of original live courses and video tutorials covering topics such as automatic labeling, data processing, and simulation engineering [4][10]. - Various learning paths are available for beginners, as well as advanced resources for those already engaged in research, ensuring a supportive environment for all levels [8][10]. - The community has compiled a wealth of open-source projects and datasets related to autonomous driving, facilitating quick access to essential materials [25][27]. Group 3: Job Opportunities and Networking - The platform has established a job referral mechanism with multiple autonomous driving companies, allowing members to submit their resumes directly to desired employers [4][11]. - Continuous job sharing and position updates are provided, contributing to a complete ecosystem for autonomous driving professionals [11][14]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from industry experts [75]. Group 4: Technical Focus Areas - The community covers a wide range of technical focus areas including perception, simulation, planning, and control, with detailed learning routes for each [15][29]. - Specific topics such as 3D target detection, BEV perception, and online high-precision mapping are thoroughly organized, reflecting current industry trends and research hotspots [42][48]. - The platform also addresses emerging technologies like visual language models (VLM) and diffusion models, providing insights into their applications in autonomous driving [35][40].
Qcnet->SmartRefine->Donut:Argoverse v2上SOTA的进化之路~
自动驾驶之心· 2025-07-31 06:19
本文只做学术分享,如有侵权,联系删文 写在前面--先聊聊为啥写这篇文章 笔者这段时间阅读了来自ICCV2025的论文 DONUT: A Decoder-Only Model for Trajectory Prediction 作者 | Sakura 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1933901730589962575 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 这篇论文以qcnet为baseline,基于 decoder-only架构配合overprediction策略 ,在argoversev2上取得了SOTA 联想到之前笔者所阅读的论文SmartRefine,该论文也是基于Qcnet的基础上对refine部分进行改进,也在argoverse v2上取得了SOTA; 因此,本着学习的态度,笔者想 在此简单总结这三篇论文 ; Query-Centric Trajectory Prediction--CVPR 2023 SmartRefin ...
ICCV 2025!首个自动驾驶RGB和Lidar紧耦合逆渲染框架InvRGB+L,直接SOTA~
自动驾驶之心· 2025-07-30 23:33
Core Insights - The article discusses the introduction of InvRGB+L, a novel inverse rendering model that integrates LiDAR intensity for reconstructing large-scale, relightable dynamic scenes from RGB+LiDAR sequences [4][26]. Group 1: Introduction of InvRGB+L - InvRGB+L is the first model to apply LiDAR intensity in inverse rendering, enhancing material estimation under varying lighting conditions [4]. - Traditional methods primarily rely on RGB inputs, often leading to inaccurate material estimates due to visible light interference [4]. Group 2: Key Innovations - The model introduces two key innovations: a physics-based LiDAR shading model and RGB-LiDAR material consistency loss, which improve the rendering results of complex scenes [4][7]. - The physics-based LiDAR shading model accurately models the relationship between LiDAR intensity values and surface material properties [7]. Group 3: Framework Components - The inverse rendering framework includes a relightable scene representation that supports decoupled and joint modeling of geometry, material, and lighting [10]. - It utilizes 3D Gaussian splats to represent scene geometry and color, incorporating physical material properties for realistic lighting interactions [13]. Group 4: Experimental Results - Quantitative results show that InvRGB+L significantly outperforms existing methods like UrbanIR in relighting tasks on the Waymo dataset, achieving a PSNR of 30.42 compared to UrbanIR's 28.84 [17][18]. - The model also demonstrates effective LiDAR intensity modeling, achieving an average intensity-RMSE of 0.063, outperforming other methods [19][20]. Group 5: Qualitative Results - Qualitative comparisons reveal that InvRGB+L effectively separates shadows from reflectance, resulting in smoother reflectance estimates compared to UrbanIR and FEGR [22]. - The model showcases versatility in scene editing, including relighting and object insertion, with seamless integration of inserted elements into the environment [23]. Group 6: Limitations and Future Work - Despite its advancements, InvRGB+L has limitations, such as potential inaccuracies in shadow rendering due to the opaque nature of Gaussian splats and insufficient handling of complex nighttime environments [26].
老师让我搭建一台自驾科研平台,看到了这个就不想动手了......
自动驾驶之心· 2025-07-30 23:33
最近我们一个学员找我们咨询自动驾驶的科研平台,老师让他自己搭建一套,但系统较为复杂,无 从下手。直到我们推荐给他黑武士,简直是梦中情车,功能上满足了所有科研要求。 黑武士001是自动驾驶之心团队推出的教研一体轻量级解决方案,支持感知、定位、融合、导航、规 划等多个功能平台,阿克曼底盘。 就在3个月前,面向科研&教学级自动驾驶全栈小车黑武士系列001正式开售了。世界太枯燥了,和 我们一起做点有意思的事情吧。 原价36999元,现在下单赠送3门课程( 模型部署+点云3D检测+多 传感器融合 ),优先锁定的安排组装发货。 1)黑武士001 黑武士支持二次开发和改装,预留了众多安装位置和接口,可以加装相机、毫米波雷达等传感器; 2)效果展示 我们测试了室内、室外、地库等场景下感知、定位、融合、导航规划等功能; 整体功能介绍 户外公园行驶 本科生学习进阶+比赛;√ 研究生科研+发论文;√ 研究生找工作+项目;√ 高校实验室教具;√ 培训公司/职业院校教具;√ 点云3D目标检测 室内地库2D激光建图 室内地库3D激光建图 上下坡测试 室外大场景3D建图 3)硬件说明 | 主要传感器 | 传感器说明 | | --- | - ...
端到端/大模型/世界模型秋招怎么准备?我们建了一个求职交流群...
自动驾驶之心· 2025-07-30 23:33
Core Viewpoint - There is a growing gap between academic knowledge and practical skills required in the workplace, particularly for job seekers preparing for campus recruitment [1] Group 1: Industry Observations - Many individuals with work experience are exploring opportunities in large models and world models, indicating a shift in industry focus [1] - Traditional regulatory frameworks are being reconsidered as the industry moves towards more embodied approaches [1] Group 2: Community Building - The company aims to create a comprehensive platform that connects talent across the industry, facilitating growth and collaboration [1] - A new community has been established to discuss industry-related topics, including company developments, product research, and job seeking [1] - The community encourages networking among industry peers and aims to provide timely insights into industry trends [1]
关于理想VLA司机大模型的22个QA
自动驾驶之心· 2025-07-30 23:33
Core Viewpoint - The article discusses the potential of the VLA (Vision-Language-Action) architecture in autonomous driving, emphasizing its long-term viability and alignment with human cognitive processes [2][12]. Summary by Sections VLA Architecture and Technical Potential - VLA has strong technical potential, transitioning from manual to AI-driven autonomous driving, and is expected to support urban driving scenarios [2]. - The architecture is inspired by robotics and embodied intelligence, suggesting it will remain relevant even after the proliferation of robots [2]. Performance Metrics and Chip Capabilities - The Thor-U chip currently operates at 10Hz, with potential upgrades to 20Hz or 30Hz through optimizations [2]. - The VLA model is designed to be platform-agnostic, ensuring consistent performance across different hardware [2]. Language Integration and Cognitive Abilities - Language understanding is crucial for advanced autonomous driving capabilities, enhancing the model's ability to handle complex scenarios [2]. - VLA's ability to generalize and learn from experiences is likened to human learning, allowing it to adapt to new situations without repeated failures [2]. Model Upgrade and Iteration - The 3.2B MoE vehicle model has a structured upgrade cycle, focusing on both pre-training and post-training updates to enhance various capabilities [3]. User Experience and Trust - The article highlights the importance of user trust and experience, noting that different user groups will gradually accept the technology [2]. - Future iterations aim to improve driving speed and responsiveness, addressing current limitations in specific scenarios [5][12]. Competitive Landscape and Differentiation - The company is closely monitoring competitors like Tesla, aiming to differentiate its approach through gradual iterations and a focus on full-scene autonomous driving [12]. - VLA's architecture is designed to support unique product experiences, setting it apart from competitors [13]. Safety Mechanisms - The AEB (Automatic Emergency Braking) function is emphasized as a critical safety feature, ensuring high frame rates for emergency scenarios [14].