自动驾驶之心

Search documents
自驾VLA新SOTA!阿里AutoDrive-R²:自反思思维链&物理奖励,突破VLA泛化瓶颈
自动驾驶之心· 2025-09-03 23:33
Core Viewpoint - The article discusses the introduction of AutoDrive-R², a novel Vision-Language-Action (VLA) framework developed by Alibaba and the University of Queensland, aimed at enhancing the reasoning and trajectory planning capabilities of autonomous driving systems through a two-stage training approach [2][49]. Group 1: Framework Overview - AutoDrive-R² integrates a structured reasoning process with self-reflection capabilities to improve decision-making in complex driving scenarios [8][10]. - The framework consists of two training phases: the first phase involves supervised fine-tuning using the nuScenesR²-6K dataset, while the second phase employs reinforcement learning (RL) with a physics-based reward framework [17][49]. Group 2: Dataset and Training - A new dataset, nuScenesR²-6K, was created to facilitate supervised fine-tuning, containing 6,000 "image-trajectory" pairs that include reasoning and self-reflection steps [19][20]. - The training process emphasizes a four-step logical chain: visualization, computation, logic, and reflection, which enhances the model's reasoning capabilities [20][43]. Group 3: Performance and Results - AutoDrive-R² demonstrated state-of-the-art (SOTA) performance on both nuScenes and Waymo datasets, achieving significant reductions in L2 error compared to existing methods [35][37]. - The model's average L2 error on the nuScenes dataset was reduced by 86.9% compared to previous leading methods, showcasing its strong generalization ability [35][39]. Group 4: Reinforcement Learning and Reward Mechanism - The reinforcement learning phase utilizes Group Relative Policy Optimization (GRPO) to optimize trajectory planning, incorporating a physics-based reward framework that ensures the generated trajectories are physically feasible and comfortable [21][26]. - The reward framework includes components for spatial alignment, vehicle dynamics, and temporal smoothness, which collectively guide the model to produce safe and realistic driving strategies [27][30][31]. Group 5: Future Directions - Future research will focus on multi-agent collaboration and real-time sensor fusion integration to further enhance the model's adaptability in complex environments [49].
自动驾驶之心超级折扣卡推出啦,所有课程七折优惠!
自动驾驶之心· 2025-09-03 06:44
Core Viewpoint - The company has launched a "Super Discount Card" to address feedback regarding high course prices in the field of autonomous driving, offering a 30% discount on all courses for a limited time [2][4]. Group 1: Course Offerings - The company has introduced several new courses in autonomous driving, including "End-to-End and VLA Autonomous Driving Small Class," "End-to-End and Planning Control (Third Session)," and "4D Annotation Algorithm Employment Small Class," which have received positive feedback [2]. - Future plans include launching additional courses focused on VLA and model deployment [2]. Group 2: Discount Card Details - The "Super Discount Card" is priced at 299 yuan and provides a 30% discount on all courses related to autonomous driving and embodied intelligence self-research courses, including future new courses [4]. - The card is valid for one year from the date of purchase and is available for a limited time from September 1 to September 14 [4]. - A full refund is available if no courses are purchased within one year of buying the discount card [4].
AI Day直播 | MemoryVLA:助力长时序机器人操作任务
自动驾驶之心· 2025-09-03 03:19
Core Viewpoint - The article discusses the development of MemoryVLA, a cognitive-memory-action framework inspired by human memory systems, aimed at improving the performance of Vision-Language-Action (VLA) models in long-term robotic manipulation tasks [3][7]. Group 1: VLA Challenges and Solutions - Existing VLA models primarily rely on current observations, leading to poor performance in long-term, time-dependent tasks [7]. - Cognitive science indicates that humans utilize a memory system involving neural activity and the hippocampus to manage tasks effectively, which serves as the inspiration for MemoryVLA [7]. Group 2: MemoryVLA Framework - MemoryVLA incorporates a pre-trained Vision-Language Model (VLM) that encodes observations into perceptual and cognitive tokens, facilitating the formation of working memory [3]. - A Perceptual-Cognitive Memory Bank is established to store consolidated low-level details and high-level semantics, allowing for adaptive retrieval of relevant entries for decision-making [3]. Group 3: Implications for Robotics - The framework aims to enhance the ability of robots to perform tasks that require temporal awareness and memory, addressing the inherent nature of robotic manipulation tasks [3][7]. - The article also touches on the importance of memory and reasoning within VLA models, suggesting a need for further exploration in these areas [7].
自动驾驶论文速递 | DriveQA、闭环仿真、AIGC、世界模型等~
自动驾驶之心· 2025-09-03 03:19
Core Insights - The article discusses the development of the DriveQA dataset, which integrates driving manuals from various U.S. states with visual scenarios from the CARLA simulation environment, creating a comprehensive driving rules question-answering benchmark with 474K samples [2][3] - It highlights the advantages of DriveQA over existing multimodal datasets in covering traffic rules and improving model generalization and reasoning capabilities [2][3] Contribution Summary DriveQA Multimodal Driving Knowledge Benchmark - DriveQA consists of two components: DriveQA-T with 26K QA pairs from 51 U.S. states covering 19 question categories, and DriveQA-V with 68K images and 448K QA pairs based on CARLA simulations, supporting various evaluation tasks [3] System Evaluation of SOTA Models - Testing on mainstream LLMs (e.g., GPT-4o, Llama-3.1) and MLLMs (e.g., LLaVA-1.5) revealed good performance on basic traffic rules but significant deficiencies in numerical reasoning, complex right-of-way scenarios, and understanding traffic sign variants [3] Model Optimization Value of DriveQA - Fine-tuning with LoRA on DriveQA significantly improved accuracy in recognizing regulatory signs and making intersection decisions, demonstrating effective generalization in downstream driving tasks [3] Analysis of Model Sensitivity and Generalization Limitations - The controlled variables in DriveQA-V revealed model sensitivity to environmental factors, and negative sampling exposed weaknesses in understanding complex rules, providing insights for optimizing rule reasoning in autonomous driving AI [3] Generative AI in Autonomous Driving Systems Testing - The article summarizes the application of generative AI in testing autonomous driving systems, categorizing existing research into six core tasks related to scenario-based testing [9][11] - It reviews various generative AI models used in testing, including LLMs, VLMs, diffusion models, GANs, and VAEs, detailing their mechanisms in different testing tasks [11][14] Evaluation Resources and Benchmark Integration - A comprehensive reference framework for datasets, simulators, ADS systems, evaluation metrics, and benchmark methods in the field of ADS testing is provided [14] Limitations and Future Directions - The article identifies 27 core limitations of generative AI in ADS testing, such as hallucination issues in LLMs and computational overhead in diffusion models, suggesting targeted improvement directions [14]
港科&地平线&浙大联手开源SAIL-Recon:三分钟重建一座城
自动驾驶之心· 2025-09-02 23:33
以下文章来源于3D视觉之心 ,作者3D视觉之心 3D视觉之心 . 3D视觉与SLAM、点云相关内容分享 点击下方 卡片 ,关注" 3D视觉之心 "公众号 第一时间获取 3D视觉干货 锚帧神经地图颠覆传统SfM 运动恢复结构(SfM)算法从一组无序图像同时估计相机位姿和场景结构,是众多计算机视觉应用的核心。传统 SfM 可分为增量式和全局式两条路线,均依赖特征提取、匹配、三角化与光束法平差;这些模块在低纹理、模糊或 重复纹理场景下易失效。 近期研究提出端到端可学习的 SfM 管线,直接由图像回归场景结构和相机位姿。DUSt3R 率先使用 Transformer 从 两张无位姿图像回归场景坐标图(SCM),后续工作将其扩展到多张图像甚至视频,但仍受限于 GPU 内存,难以 处理数千张图像的大规模场景。 另一方面,现有场景回归方法忽略了视觉定位——3D 视觉中的基本任务,可为 SfM 系统扩展提供关键支撑。 SLAM 系统仅在关键帧建图,对非关键帧执行定位,从而节省计算与内存。 文章标题 :SAIL-Recon: Large SfM by Augmenting Scene Regression with Local ...
某头部智驾公司最快或11月美股上市,估值或超60亿美金
自动驾驶之心· 2025-09-02 23:33
Core Viewpoint - The article discusses the recent developments and future prospects of a leading autonomous driving company, referred to as "M," highlighting its financing activities, market positioning, and growth potential in the context of the autonomous driving industry [6][10][12]. Financing and Market Position - Company M has completed two rounds of financing this year, involving several billion USD, with investors including a state-owned fund and a Middle Eastern sovereign fund [6][10]. - M is expected to go public in the US by November 2025, with a projected valuation exceeding 6 billion USD [6][10]. - The company has been relatively slow in capital market activities compared to peers, which have already listed on various exchanges [9][10]. Revenue Growth and Profitability - M has maintained rapid revenue and gross profit growth for three consecutive years, despite currently operating at a loss, with expectations to achieve breakeven by 2026 [7][12]. - The revenue structure primarily consists of Non-Recurring Engineering (NRE) fees and licensing fees, with the latter showing high gross margins, potentially reaching over 90% [12][15]. Strategic Partnerships and Product Development - M has established partnerships with major automotive brands, increasing its production model collaborations to 130 [12][14]. - The company has launched a chip subsidiary, "X," which has attracted significant investment and is currently testing its first chip in real vehicles [12][14]. - M's strategic moves, including a partnership with Uber for autonomous vehicle operations in Europe, are seen as critical steps leading up to its IPO [12][14]. Market Dynamics and Competitive Landscape - M's ability to deliver quickly and adapt to customer needs has positioned it favorably among traditional automakers, leading to a strong demand for its services [14]. - The company is expected to achieve a delivery milestone of over 1 million vehicles by next year, reflecting its growing market presence [13][14]. - The competitive landscape in the autonomous driving sector is characterized by high stakes, with significant financial investments and the potential for consolidation among companies [16].
拿到offer了,却开心不起来。。。
自动驾驶之心· 2025-09-02 23:33
Group 1 - The article discusses the importance of the autumn recruitment season, highlighting a student's experience of receiving an offer from a tier 1 company but feeling unfulfilled due to a desire to transition to a more advanced algorithm position [1] - The article encourages perseverance and self-challenge, emphasizing that pushing oneself can reveal personal limits and potential [2] Group 2 - A significant learning package is introduced, including a 499 yuan discount card for a year of courses at a 30% discount, various course benefits, and hardware discounts [4][6] - The focus is on cutting-edge autonomous driving technologies for 2025, particularly end-to-end (E2E) and VLA autonomous driving systems, which are becoming central to the industry [7][8] Group 3 - The article outlines the development of end-to-end autonomous driving algorithms, emphasizing the need for knowledge in multimodal large models, BEV perception, reinforcement learning, and more [8] - It highlights the challenges faced by beginners in synthesizing knowledge from fragmented research papers and the lack of practical guidance in transitioning from theory to practice [8] Group 4 - The introduction of a 4D annotation algorithm course aims to address the increasing complexity of training data requirements for autonomous driving, emphasizing the importance of automated 4D annotation [11][12] - The course is designed to help newcomers navigate the challenges of entering the field and to optimize their learning paths [12] Group 5 - The article discusses the emergence of multimodal large models in autonomous driving, noting the rapid growth of job opportunities in this area and the need for systematic learning platforms [14] - It emphasizes the importance of practical experience and project involvement for job seekers in the autonomous driving sector [21] Group 6 - The article mentions various specialized courses available, including those focused on perception, model deployment, planning control, and simulation in autonomous driving [16][18][20] - It highlights the importance of community engagement and support through VIP groups for course participants, facilitating discussions and problem-solving [26]
小米汽车招聘云端大模型算法工程师(BEV/3DGS/OCC等)
自动驾驶之心· 2025-09-02 23:33
职位描述 职位要求 小米汽车云端大模型算法工程师 1. 负责数据驱动的云端大模型算法研发和优化,研发场景与标签的生成式算法技术,包括但不限于 4D真值自动化标注、多模态大模型等方向; 2. 基于海量量产数据,研发无监督/自监督算法,持续探索大模型的语义理解能力和空间感知能 力; 3. 研发和设计基于数据驱动的自动驾驶算法迭代链路,高效的自训练pipeline,提高数据闭环效 率。 1. 扎实的C++或Python语言知识及熟练运用, 扎实的数据结构与算法知识; 2. 在自动驾驶相关的感知算法(包括BEV感知/3D Detection/Segmentation/Occupancy Network/多 传感器融合/NerF/单目/多目深度估计/三维重建)中的一个或多个领域有过深入研究的经历; 3. 计算机、数学、机器学习、机器人、自动驾驶或相关专业优先; 4. 有使用 NeRF、3D 场景生成和传感器仿真等相关科研或应用经验优先; 5. 具有自动驾驶相关项目经验者优先考虑。 投递方式: https://xiaomi.jobs.f.mioffice.cn/index/position/748309880141642 ...
自动驾驶之心开学季活动来了(超级折扣卡/课程/硬件/论文辅导福利放送)
自动驾驶之心· 2025-09-02 09:57
Core Viewpoint - The article reflects on the evolution of autonomous driving over the past decade, highlighting significant technological advancements and the ongoing need for innovation and talent in the industry [2][3][4]. Group 1: Evolution of Autonomous Driving - Autonomous driving has progressed from basic image classification to advanced perception systems, including 3D detection and end-to-end models [3]. - The industry has witnessed both failures and successes, with companies like Tesla, Huawei, and NIO establishing strong technological foundations [3]. - The journey of autonomous driving is characterized by continuous efforts rather than sudden breakthroughs, emphasizing the importance of sustained innovation [3]. Group 2: Importance of Talent and Innovation - The future of autonomous driving relies on a steady influx of talent dedicated to enhancing safety and performance [4]. - Innovation is identified as the core of sustainable business growth, with a focus on practical applications and real-world problem-solving [6]. - The article encourages a mindset of continuous learning and adaptation to keep pace with rapid technological changes [6]. Group 3: Educational Initiatives and Resources - The company has developed a series of educational resources, including video tutorials and courses covering nearly 40 subfields of autonomous driving [8][9]. - Collaborations with industry leaders and academic institutions are emphasized to bridge the gap between theory and practice [8]. - The article outlines various courses aimed at equipping learners with the necessary skills for careers in leading autonomous driving companies [9][10]. Group 4: Future Directions in Technology - Key technological directions for 2025 include end-to-end autonomous driving and the integration of large models [12][20]. - The article discusses the significance of multi-modal large models in enhancing the capabilities of autonomous systems [20]. - The need for advanced data annotation techniques, such as automated 4D labeling, is highlighted as crucial for improving training data quality [16].
自动驾驶多传感器融合感知1v6小班课来了(视觉/激光雷达/毫米波雷达)
自动驾驶之心· 2025-09-02 06:51
随着自动驾驶、机器人导航和智能监控等领域的快速发展,单一传感器(如摄像头、激光雷达或毫米波雷达)的感知能力已难 以满足复杂场景的需求。 为了克服这一瓶颈,研究者们开始将激光雷达、毫米波雷达和摄像头等多种传感器的数据进行融合,构建一个更全面、更鲁棒 的环境感知系统。这种融合的核心思想是优势互补。摄像头提供丰富的语义信息和纹理细节,对车道线、交通标志等识别至关 重要;激光雷达则生成高精度的三维点云,提供准确的距离和深度信息,尤其在夜间或光线不足的环境下表现优异;而毫米波 雷达在恶劣天气(如雨、雾、雪)下穿透性强,能稳定探测物体的速度和距离,且成本相对较低。通过融合这些传感器,系统 可以实现全天候、全场景下的可靠感知,显著提高自动驾驶的鲁棒性和安全性。 当前的多模态感知融合技术正在从传统的融合方式,向更深层次的端到端融合和基于Transformer的架构演进。 传统的融合方式主要分为三种:早期融合直接在输入端拼接原始数据,但计算量巨大;中期融合则是在传感器数据经过初步特 征提取后,将不同模态的特征向量进行融合,这是目前的主流方案,例如将所有传感器特征统一到 鸟瞰图(BEV) 视角下进 行处理,这解决了不同传感器数据 ...