自动驾驶之心
Search documents
做了一份3DGS全栈学习路线图,包含前馈GS......
自动驾驶之心· 2025-12-16 03:16
Core Insights - The article highlights the introduction of 3D Gaussian (3DGS) technology by Tesla, indicating a significant advancement in autonomous driving through the use of feed-forward GS algorithms [1][3] - There is a consensus in the industry regarding the rapid iteration of 3DGS technology, with various companies actively hiring for related positions [1][3] Group 1: Course Overview - A new course titled "3DGS Theory and Algorithm Practical Tutorial" has been developed to provide a structured learning path for newcomers to the 3DGS field, covering both theoretical and practical aspects [3][7] - The course is designed to help participants understand point cloud processing, deep learning, real-time rendering, and coding practices [3][7] Group 2: Course Structure - The course consists of six chapters, starting with foundational knowledge in computer graphics and progressing to advanced topics such as dynamic reconstruction and surface reconstruction [7][8] - Each chapter includes practical assignments and discussions on relevant algorithms and frameworks, such as the use of NVIDIA's open-source 3DGRUT framework [8][9] Group 3: Target Audience and Requirements - The course is aimed at individuals with a background in computer graphics, visual reconstruction, and programming, specifically those familiar with Python and PyTorch [16] - Participants are expected to have a GPU with a recommended capability of 4090 or higher to effectively engage with the course content [16] Group 4: Learning Outcomes - By the end of the course, participants will have a comprehensive understanding of the 3DGS technology stack, including algorithm development and the ability to train open-source models [16] - The course also facilitates networking opportunities with peers from academia and industry, enhancing career prospects in the field [16]
十余所机构联合提出WorldLens:评测了所有开源自驾世界模型(中科院&新国立等)
自动驾驶之心· 2025-12-16 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | WorldBench 编辑 | 自动驾驶之心 现有世界模型在视觉生成上已经相当逼真,但在几何一致性、时序稳定性和行为合理性上仍存在明显缺陷,而这些问题往往难以通过传统的视频质量指标被发现。针 对这个问题 WorldBech团队提出了WorldLens。 这一全方位基准用于评估模型构建、理解其生成世界并在其中行为的能力。它涵盖五个核心维度: 生成质量、重建性能、指令跟随、下游任务适配性和人类偏好 ,全 面覆盖视觉真实性、几何一致性、物理合理性和功能可靠性。评估结果显示,现有世界模型均无法实现全维度最优:部分模型纹理表现出色但违背物理规律,而几何 稳定的模型则缺乏行为可信度。为使客观指标与人类对齐,WorldLens进一步构建了WorldLens-26K数据集——包含大规模人类标注视频,附带量化评分和文本说明, 并开发了WorldLens-Agent评估模型,通过蒸馏这些标注数据实现可扩展、可解释的评分。基准、数据集与智能评估代理共同构成统一生态系 ...
SOTA!FaithFusion:即插即用的生成重建统一框架(百度&南大)
自动驾驶之心· 2025-12-16 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | YuAn Wang等 编辑 | 自动驾驶之心 破解核心痛点:生成式重建中的几何一致性与创造性平衡 无论是物体级还是场景级三维任务,"重建" 与 "生成" 的融合始终面临核心矛盾:如何兼顾 生成的创造性与多样性 ,同时保障 几何重建对原始观测的保真度 。在三 维场景重建领域,3D 高斯泼溅(3DGS)的高保真几何能力与扩散模型(Diffusion)的外观生成能力结合,已是新视角合成的主流路径,但因缺乏像素级、3D 空间一 致的编辑准则,常出现过修复(篡改可信区域)和几何漂移(未观测区域失真)问题。 为缓解这一矛盾,现有方法多采用 "外部约束" 范式:要么从生成侧引入 LiDAR、HDMap 等外部先验限制 Diffusion 自由度,要么改造 3DGS 重建侧强化保真。但这 类方案依赖额外输入或定制化改造,既抬高落地成本,也限制了通用性。 FaithFusion 的核心突破的是跳出 "外部依赖",从 3DGS 模型自身挖掘内在指导信号。其摒弃经验 ...
手搓了一辆自动驾驶全栈小车,面向科研......
自动驾驶之心· 2025-12-16 00:03
Core Viewpoint - The article introduces the "Black Warrior 001," a cost-effective and user-friendly autonomous driving educational vehicle designed for research and teaching purposes, priced at 36,999 yuan, which includes various advanced features and training courses [2][4]. Group 1: Product Overview - The Black Warrior 001 is a lightweight solution that supports multiple functionalities such as perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [4]. - It is suitable for various educational levels, including undergraduate learning, graduate research, and training in vocational schools [4]. Group 2: Performance Demonstration - The vehicle has been tested in various environments, including indoor, outdoor, and parking garage scenarios, showcasing its capabilities in perception, localization, fusion, navigation, and planning [6][8][12][14][16][18][20]. Group 3: Hardware Specifications - Key sensors include: - 3D LiDAR: Mid 360 - 2D LiDAR: from Lidar Technology - Depth Camera: from Orbbec, equipped with IMU - Main Control Chip: Nvidia Orin NX 16G - Display: 1080p [22][23]. - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a maximum speed of 2 m/s [25][26]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, with one-click startup and a provided development environment [28]. - The vehicle supports various functionalities such as 2D and 3D SLAM, point cloud processing, vehicle navigation, and obstacle avoidance [29]. Group 5: After-Sales and Maintenance - The company offers one year of after-sales support (excluding human damage), with free repairs for damages caused by operational errors or code modifications during the warranty period [52].
聊聊关于 Agentic RL 训推框架的一点看法和思考
自动驾驶之心· 2025-12-16 00:03
Core Viewpoint - The article discusses the current landscape of Reinforcement Learning (RL) training frameworks, highlighting the diversity and specific strengths and weaknesses of various open-source options, particularly focusing on the challenges of adapting these frameworks for multi-modal models in real-world environments [2][3]. Summary by Sections Overview of RL Frameworks - The open-source community has a wide variety of RL training frameworks, including established ones like openlhf, trl, unsloth, and verl, as well as newer entries like slime, AReaL, Rlinf, RL2, and ROLL [2]. Framework Selection Criteria - The author emphasizes the need for a community-active framework that requires minimal code modification for environmental adaptation, ultimately selecting AReaL due to its flexibility in handling multi-turn interactions [3]. GPU Management in RL Training - The article discusses the GPU orchestration challenges in RL training, noting that traditional frameworks often follow a synchronous training model, which can lead to inefficiencies and wasted resources [5][12]. Data Flow and Structure - The data flow in RL training frameworks is crucial, with verl using a specific data format called DataProto for efficient data transfer, although this can become a burden in agentic RL scenarios [10][11]. Asynchronous vs. Synchronous Training - Asynchronous RL training frameworks are highlighted for their efficiency, but they also introduce challenges such as data offset issues and higher GPU resource consumption compared to synchronous models [11][12]. Control Flow in RL Training - The control flow in RL training remains primarily on the training side, with the article explaining that the training process is similar to standard LLM training, differing mainly in the loss function used [15]. Weight Transfer Between Engines - The article details the complexities involved in transferring model weights from the training engine to the inference engine, particularly when the two engines have different model partitioning schemes [16][19]. Gaps in RL Training - Two significant gaps are identified: the need for on-policy data in RL training and the discrepancies in token distributions between rollout and prefill processes, which complicate the calculation of importance sampling [20][23]. Environment Adaptation and Reward Management - The article emphasizes the importance of environment adaptation and reward calculation in agentic RL training, noting that different frameworks handle these aspects differently, with AReaL and slime offering more flexible solutions [24][26]. Asynchronous Training Solutions - AReaL's asynchronous training approach is presented as a mature solution, utilizing a producer-consumer model to manage data flow efficiently [29][30]. Partial Rollout Management - The concept of partial rollout is introduced as a method to manage ongoing tasks during model weight updates, allowing for efficient training without interrupting the inference process [37][38]. Insights on RL Algorithms - The article concludes with reflections on RL algorithms, discussing the challenges of reward structuring and the potential benefits of staged training approaches [39][40]. Code Complexity and Usability - The author notes the complexity of the code in frameworks like AReaL and verl, suggesting that while they are well-engineered, they may pose a steep learning curve for new users [43][44].
没有好的科研能力,别想着去业界搞自驾了......
自动驾驶之心· 2025-12-15 11:33
Core Viewpoint - The article discusses the high demand for skilled talent in the autonomous driving sector, highlighting the competitive salaries and the importance of comprehensive research capabilities for candidates [2]. Group 1: Talent Demand and Requirements - High-end autonomous driving talent is in great demand, with some companies offering annual packages of up to 700,000 yuan for master's degree holders [2]. - Candidates are expected to possess complete research capabilities, which include problem identification, definition, and solution proposal, rather than just academic knowledge [2]. Group 2: Research Challenges - Many students face challenges in their research, such as lack of familiarity with the field, absence of real data, and difficulties in experimental design [7]. - The fastest way to improve research skills is to work alongside experienced researchers, as indicated by the introduction of a 1-on-1 research mentoring service [3]. Group 3: Mentoring Services Offered - The company offers guidance in various research areas including end-to-end systems, reinforcement learning, 3D object detection, and more [4]. - Services include paper topic selection, full process guidance for papers, experimental guidance, and support for doctoral applications [12]. Group 4: Publication Success - The mentoring service has a high publication rate, with multiple papers accepted in top conferences and journals such as CVPR, AAAI, and ICLR [9].
45万亿!中国智驾的新风口来了
自动驾驶之心· 2025-12-15 11:33
Core Viewpoint - The commercialization process of L4-level autonomous driving is significantly accelerating, driven by policies, technology, and application scenarios, marking the beginning of the "Universal Smart Driving" era by 2025 with an expected vehicle ownership exceeding 100,000 in five cities and a related industry scale surpassing 20 billion yuan [2][3]. Policy Perspective - National-level planning and pilot projects in five cities clarify accident liability, removing institutional barriers [2]. - The release of policy dividends, core technological advancements, and the expansion of application scenarios are collectively fostering the growth of L4 autonomous driving [2]. Technological Development - Continuous reduction in system costs and enhanced vehicle-road-cloud collaboration capabilities are improving reliability in complex environments [2]. - The standardization of "vehicle-cloud" and "vehicle-road-cloud" collaboration is becoming essential, with a rise in patents related to perception, decision-making, and control [3]. Application Scenarios - L4 autonomous driving is currently in the commercial model exploration and full-scale application phase for low-speed semi-open and closed scenarios, while mid-to-high-speed open scenarios are still in early stages [6]. - Various application scenarios include Robotaxi, unmanned delivery, and trunk logistics, transitioning from low-speed closed to mid-speed open environments [2][3]. Business Models - The main business models currently include product sales and operational agency, with product sales being the primary focus [8]. - In industrial parks, L4 autonomous vehicles can save approximately 180,000 yuan annually per intelligent heavy forklift, while smart patrol vehicles in commercial parks can save around 70,000 yuan annually [11]. Cost Savings - In urban sanitation scenarios, L4 autonomous vehicles can save 11% in costs compared to manual cleaning, while electric autonomous sanitation vehicles can save 21% compared to traditional diesel vehicles [15]. - The expected annual cost savings for logistics operations using L4 technology can reach 170,000 yuan compared to traditional vehicles after large-scale operations [29]. Future Trends - L4 autonomous driving is transitioning from the technology validation phase to the commercialization phase, facing challenges such as technical bottlenecks, regulatory gaps, and ethical data issues [45]. - By 2035, the market size for L4 and above autonomous driving in China is projected to exceed 45 trillion yuan, with a penetration rate of over 13% [45].
小鹏最新一篇基于潜在思维链世界模型的FutureX,车端可以借鉴...
自动驾驶之心· 2025-12-15 06:00
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Hongbin Lin等 编辑 | 自动驾驶之心 港中文联合小鹏最新的一篇工作,很有意思。基于潜在思维链世界模型增强端到端的能力, 有一些值得业内尝试的改进点: 一、背景回顾 端到端(E2E)自动驾驶指的是通过完全可微分的映射,直接将多模态原始传感器数据流转换为运动规划或底层驱动指令的技术流水线。该领域在算法方案和基准测 试两方面均取得了快速发展。尽管面临固有挑战,现有方法仍实现了显著进步。 在这些成功背后,现有端到端自动驾驶系统通过单一神经网络直接将传感器输入映射为控制输出,执行高效的一次性前向预测,而无需进一步"思考"。这导致它们在 复杂环境中缺乏适应性和可解释性(图1第二行)。在人类认知中,驾驶员在执行任何操作前,都会在脑海中模拟可能的未来场景:预测周围车辆的运动趋势、场景的 演变方向,以及每种可能行为的潜在结果(图1第一行)。这种内在推理能力使人类能够做出安全且贴合场景的决策。因此,对于端到端系统而言,在高度动态的交通 环境中推断未来场 ...
世界模型与自动驾驶:最新算法&实战项目(特斯拉、视频、OCC等)
自动驾驶之心· 2025-12-15 06:00
Core Viewpoint - The article introduces a new course focused on world models in autonomous driving, highlighting its relevance and the collaboration with industry experts to provide comprehensive training in this emerging field [2][4]. Course Overview - The course will cover various aspects of world models, including their historical development, current applications, and different methodologies such as pure simulation, simulation plus planning, and generative sensor input [7]. - It aims to equip participants with the necessary skills and knowledge to understand and implement world models in autonomous driving [12]. Course Structure - **Chapter 1: Introduction to World Models** This chapter will provide an overview of world models and their connection to end-to-end autonomous driving, discussing various streams and their applications in the industry [7]. - **Chapter 2: Background Knowledge of World Models** This chapter will delve into foundational knowledge, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding world models [8]. - **Chapter 3: Discussion on General World Models** Focused on popular models like Marble and Genie 3, this chapter will explore their core technologies and design philosophies [9]. - **Chapter 4: Video Generation-Based World Models** This chapter will cover video generation algorithms, highlighting significant works and recent advancements in the field [10]. - **Chapter 5: OCC-Based World Models** This chapter will focus on OCC generation methods, discussing their applications in trajectory planning and end-to-end systems [11]. - **Chapter 6: World Model Job Topics** This chapter will provide insights into industry applications, challenges, and interview preparation for roles related to world models [11]. Target Audience and Learning Outcomes - The course is designed for individuals aiming to advance their knowledge in end-to-end autonomous driving and world models, with expectations to reach a level equivalent to one year of experience in the field [15]. - Participants will gain a deep understanding of key technologies such as video generation, OCC generation, BEV perception, and more, enabling them to apply these concepts in real-world projects [15].
「CV顶会王」李弘扬投身具身智能赛道!
自动驾驶之心· 2025-12-15 00:04
Core Insights - RoboX is focusing on embodied intelligence and has entered the robotics manipulation sector, led by researcher Li Hongyang from the University of Hong Kong and Shanghai AI Laboratory [3][4] - The company has formed a research team of several dozen members, covering areas such as VLA, robotics, autonomous driving, and edge computing chips [4] - Li Hongyang's research has significantly advanced autonomous driving technology, particularly through the UniAD framework, which integrates various tasks into a single end-to-end network [6][7] Research Achievements - Li Hongyang's work on the UniAD framework has outperformed state-of-the-art methods on the nuScenes dataset, showcasing its effectiveness in autonomous driving [6] - His previous method, BEVFormer, was recognized as one of the top 100 AI papers in 2022, establishing a benchmark for visual detection in the industry [7] - The team has created a large-scale real robot manipulation dataset called "AgiBot World," which impacts multiple industry scenarios [7] Future Directions - The upcoming paper titled "UniVLA: Learning to Act Anywhere with Task-centric Latent Actions" introduces a task-centric latent action framework that enhances robot strategy learning across different environments [9][10] - UniVLA reduces reliance on labeled data, achieving optimal performance with minimal data in multi-task benchmarks, and supports efficient transfer from internet videos to real robots [10] - The company aims to establish a full-stack self-research route with a vision to enhance few-shot generalization capabilities for humanoid robots in various applications [10][11] Industry Trends - The field of embodied intelligence is gaining traction, with several experts from academia transitioning into this area, indicating a growing interest and investment in the sector [11]