自动驾驶之心

Search documents
死磕技术的自动驾驶黄埔军校,三年了~
自动驾驶之心· 2025-08-28 03:22
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving enthusiasts, aiming to facilitate knowledge sharing, technical discussions, and job opportunities in the field of autonomous driving and AI [1][13]. Group 1: Community Development - The "Autonomous Driving Heart Knowledge Planet" has grown to over 4,000 members, with a goal to reach nearly 10,000 in the next two years, providing a platform for exchange and technical sharing [1]. - The community offers a variety of resources, including video content, articles, learning paths, Q&A sessions, and job exchange opportunities [1][2]. Group 2: Learning Resources - The community has organized nearly 40 technical routes for members, covering various aspects of autonomous driving, including end-to-end learning, multi-modal models, and data annotation practices [2][5]. - A complete learning stack and roadmap for beginners have been prepared, making it suitable for those with no prior experience [7][9]. Group 3: Industry Insights - The community regularly invites industry leaders and experts to discuss trends in autonomous driving, technology directions, and production challenges [4][62]. - Members can engage in discussions about job opportunities, industry developments, and academic advancements, fostering a collaborative environment [59][64]. Group 4: Technical Focus Areas - Key focus areas include end-to-end autonomous driving, multi-sensor fusion, 3DGS, and NeRF technologies, with detailed resources and discussions available for each topic [31][32][33]. - The community also provides insights into the latest advancements in visual language models (VLM) and their applications in autonomous driving [35][36].
没有数据闭环的端到端只是半成品!九大议题权威解析~
自动驾驶之心· 2025-08-27 23:33
( 51Sim端到端 数据闭环生态论坛 51Sim End-to-End Data Closed-Loop Ecosystem Forum e 2025.8.28 (周四) 下午 13:00-17:00 上海市浦东新区国展路1099号 上海世博展览馆 B1层 2号会议厅 扫描二维码免费参加 Opening Remarks O 致开场欢迎词 端到端数据闭环 End-to-End Data Closed L 新挑战与新征程! 端到端时代数据驱动闭环全新升级 张晓娜 51Sim 车辆事业部总经理 e Cod Frist 端到端时代仿真测试的思考与探索 敬 长城汽车 智能驾驶高级专家 鲍世强 51Sim CEO 张 敏 超 东风汽车股份有限公司 智驾系统开发主任 北汽智驾仿真能力建设及实践 总轩 北京汽车研究总院有限公司 智能驾驶部高级专家 Toward Re 面向合规的驾驶自动化仿真测试 可信度评估技术研究 陈硕 中汽智能科技(天津)有限公司 主任工程师 轻型商用车自动驾驶系统设计及验证 8月28日 不见不散! 推荐阅读 上海世博展览馆地址: 南门:上海市浦东新区国展路1099号 北门:上 海市浦东新区 博成路85 ...
超高性价比3D扫描仪!点云/视觉全场景重建,高精厘米级重建
自动驾驶之心· 2025-08-27 23:33
Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective 3D laser scanner designed for industrial and research applications, emphasizing its lightweight design, ease of use, and advanced features for real-time 3D scene reconstruction. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in 3D scene reconstruction using a multi-modal sensor fusion algorithm, capable of generating point clouds at a rate of 200,000 points per second and covering distances up to 70 meters [1][29]. - It supports scanning areas exceeding 200,000 square meters and can be equipped with a 3D Gaussian data collection module for high-fidelity scene restoration [1][50]. - The device is designed for easy operation with a one-button start feature, allowing users to quickly initiate scanning tasks without complex setups [5][42]. Group 2: Technical Specifications - The GeoScan S1 integrates various sensors, including RTK, IMU, and dual wide-angle cameras, and features a compact design with dimensions of 14.2cm x 9.5cm x 45cm and a weight of 1.3kg (excluding battery) [22][12]. - It operates on a power input of 13.8V - 24V with a power consumption of 25W, and has a battery capacity of 88.8Wh, providing approximately 3 to 4 hours of operational time [22][26]. - The system supports multiple data export formats, including PCD, LAS, and PLV, and runs on Ubuntu 20.04, compatible with ROS [22][42]. Group 3: Market Positioning - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan for the basic version [9][57]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, with over a hundred projects demonstrating its capabilities [9][38]. - The device is designed to facilitate unmanned operations and can be integrated with various platforms such as drones and robotic vehicles, enhancing its versatility in different operational environments [44][46].
理想汽车智驾方案MindVLA方案详解
自动驾驶之心· 2025-08-27 23:33
作者 | 跃来跃好 来源 | 地平线开发者 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 2.1 传统端到端自动驾驶的不足 传统的端到端自动驾驶通过感知(Perception)生成 3D 目标框(3D Boxes);然后预测模块使用 3D 目标和地图预测运动轨迹;规划模块根据预测进行轨迹 规划。 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 01 引言 MindVLA 主要包括空间智能模块、语言智能模块、动作策略模块、强化学习模块,这些模块分别有以下功能: 空间智能模块:输入为多模态传感器数据,使用 3D 编码器提取时空特征,然后将所有传感器与语义信息融合成统一的表征。 语言智能模块:嵌入式部署的大语言模型 MindGP ,用于空间 + 语言的联合推理,支持语音指令和反馈,可能实现人车交互。 动作策略模块:使用扩散模型生成车辆未来的行为轨迹,引入噪声来引导扩散过程以生成多样化的动作规划。 强化学习模块:使用 World Model 模拟外部环境响应,评估行为后果;使用 奖励模型(Reward Model) :提 ...
InternVL 3.5来了!上海AI Lab最新开源:硬刚 GPT-5 还把效率玩明白
自动驾驶之心· 2025-08-27 23:33
Core Viewpoint - Shanghai AI Lab has launched the open-source multimodal model InternVL 3.5, which significantly advances the performance of the InternVL series in terms of generality, reasoning ability, and inference efficiency compared to its predecessors [2]. Model Architecture - InternVL 3.5 consists of three core components: a dynamic high-resolution text tokenizer, an InternViT visual encoder, and a connector that integrates visual and language modalities [5]. - The model employs a two-stage training paradigm, including a large-scale pre-training phase and a multi-stage post-training phase [5][6]. Training Objectives - The pre-training phase utilizes a large-scale multimodal corpus to learn general visual-language representations, with a total of approximately 1.16 billion samples corresponding to about 250 billion tokens [7]. - The post-training strategy includes three stages: Supervised Fine-Tuning (SFT), Cascade Reinforcement Learning (Cascade RL), and Visual Consistency Learning (ViCO) [9]. Performance Metrics - InternVL 3.5 has shown superior performance across various benchmarks, achieving notable scores in tasks such as MMStar, MMVet, and MMBench V1.1 [14]. - The model's performance is competitive with top commercial models like GPT-5, demonstrating significant improvements in multimodal reasoning and mathematical tasks [14][15]. Testing and Deployment - The model incorporates a test-time scaling method to enhance reasoning capabilities, particularly for complex tasks requiring multi-step reasoning [11]. - The Decoupled Vision-Language Deployment (DvD) framework optimizes hardware costs and facilitates seamless integration of new modules without modifying the language server deployment [12].
死磕技术的自动驾驶全栈学习社区,近40+方向技术路线~
自动驾驶之心· 2025-08-27 01:26
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving enthusiasts, aiming to connect learners and professionals in the field, providing resources, networking opportunities, and industry insights. Group 1: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, serving as a hub for communication and technical sharing [1][12] - The community offers a variety of resources including video content, articles, learning paths, Q&A, and job exchange opportunities [1][2] - Nearly 40 technical routes have been organized within the community, catering to various interests such as industry applications and the latest benchmarks [2][5] Group 2: Learning and Development - The community provides structured learning paths for beginners, including full-stack courses suitable for those with no prior experience [7][9] - Members can access detailed information on end-to-end autonomous driving, multi-modal models, and various data sets for training and fine-tuning [3][26] - Regular discussions with industry leaders are held to explore trends, technological directions, and production challenges in autonomous driving [4][58] Group 3: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [9][11] - Members are encouraged to engage in discussions about career choices and research directions, receiving guidance from experienced professionals [55][60] - The platform aims to connect members with job openings and industry opportunities, enhancing their career prospects in the autonomous driving sector [1][62]
自动驾驶VLA技术交流群成立了(数据/模型/部署等方向)
自动驾驶之心· 2025-08-26 23:32
自动驾驶之心大模型VLA技术交流群成立了,欢迎大家加入一起交流VLA相关的内容:包括VLA数据集制 作、一段式VLA、分层VLA、基于大模型的端到端方案、基于VLM+DP的方案、量产落地、求职等内容。 感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称+VLA加群。 ...
理想汽车MoE+Sparse Attention高效结构解析
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses the advanced technologies used in Li Auto's autonomous driving solutions, specifically focusing on the "MoE + Sparse Attention" efficient structure that enhances the performance and efficiency of large models in 3D spatial understanding and reasoning [3][6]. Group 1: Introduction to Technologies - The article introduces a series of posts that delve deeper into the advanced technologies involved in Li Auto's VLM and VLA solutions, which were only briefly discussed in previous articles [3]. - The focus is on the "MoE + Sparse Attention" structure, which is crucial for improving the efficiency and performance of large models [3][6]. Group 2: Sparse Attention - Sparse Attention limits the complexity of the attention mechanism by focusing only on key input parts, rather than computing globally, which is particularly beneficial in 3D scenarios [6][10]. - The structure combines local attention and strided attention to create a sparse yet effective attention mechanism, ensuring that each token can quickly propagate information while maintaining local modeling capabilities [10][11]. Group 3: MoE (Mixture of Experts) - MoE architecture divides computations into multiple expert sub-networks, allowing only a subset of experts to be activated for each input, thus enhancing computational efficiency without significantly increasing inference costs [22][24]. - The article outlines the core components of MoE, including the Gate module for selecting experts, the Experts module as independent networks, and the Dispatcher for optimizing computation [24][25]. Group 4: Implementation and Communication - The article provides insights into the implementation of MoE using DeepSpeed, highlighting its flexibility and efficiency in handling large models [27][29]. - It discusses the communication mechanisms required for efficient data distribution across multiple GPUs, emphasizing the importance of the all-to-all communication strategy in distributed training [34][37].
一文尽览!2025年多篇VLA与RL融合的突破方向
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses a significant revolution in the field of robotic embodiment intelligence, focusing on the integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) to address core challenges in real-world robotic decision-making and task execution [2][58]. Summary by Sections GRAPE: Generalizing Robot Policy via Preference Alignment - The GRAPE framework enhances VLA model generalization and adaptability by aligning trajectories, decomposing tasks, and modeling preferences with flexible spatiotemporal constraints [5][6]. - GRAPE shows a 51.79% increase in success rates for seen tasks and a 58.20% increase for unseen tasks, while also reducing collision rates by 37.44% under safety objectives [8][9]. VLA-RL: Towards Masterful and General Robotic Manipulation - The VLA-RL framework addresses the failure of VLA models in out-of-distribution scenarios by utilizing trajectory-level RL expressions and fine-tuning reward models to handle sparse rewards [11][13]. - VLA-RL significantly improves performance on 40 challenging robotic tasks, demonstrating the potential for early reasoning expansion in robotic applications [15]. ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations - The ReWiND framework allows for task adaptation using pre-trained language-based reward functions, eliminating the need for new demonstrations for unseen tasks [18][19]. - ReWiND exhibits a 2.4 times improvement in reward generalization and a 5 times increase in new task adaptation efficiency compared to baseline methods [21]. ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy - ConRFT employs a two-phase reinforcement fine-tuning approach to stabilize VLA model performance during supervised learning [24][26]. - The method achieves a 96.3% success rate across eight practical tasks, improving performance by 144% compared to previous supervised learning methods [29]. RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning - RLDG enhances generalist policies by generating high-quality training data through reinforcement learning, addressing performance and generalization issues [33][34]. - The method shows a 40% increase in success rates for precise operation tasks, demonstrating improved adaptability to new tasks [39]. TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization - TGRPO introduces online reinforcement learning to VLA models, enhancing robustness and efficiency in policy learning [40][42]. - The method outperforms various baseline approaches in ten operational tasks, validating its effectiveness in improving VLA model adaptability [44]. Improving Vision-Language-Action Model with Online Reinforcement Learning - The iRe-VLAd framework optimizes VLA models through iterative reinforcement and supervised learning, addressing stability and computational challenges [45][47]. - The framework demonstrates effective performance improvements in interactive scenarios, providing a viable path for optimizing large VLA models [51]. Interactive Post-Training for Vision-Language-Action Models - RIPT-VLA offers a scalable, reinforcement learning-based interactive post-training approach to enhance VLA models in low-data environments [52][53]. - The method achieves a 97% success rate with minimal supervision, showcasing its robustness and adaptability across various tasks [57]. Conclusion - The eight studies represent a significant advancement in robotic intelligence, focusing on overcoming industry challenges such as strategy generalization and dynamic environment adaptation, with practical applications in home tasks, industrial assembly, and robotic manipulation [58].
超越OmniRe!中科院DriveSplat:几何增强的神经高斯驾驶场景重建新SOTA
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses the introduction of DriveSplat, a new method for 3D reconstruction of driving scenes that significantly enhances the accuracy of both static and dynamic elements, achieving state-of-the-art performance in novel view synthesis tasks on two autonomous driving datasets [2][41]. Group 1: Background and Motivation - Realistic closed-loop simulation of driving scenes has become a major research focus in both academia and industry, addressing factors such as fast-moving vehicles and dynamic pedestrians [2][5]. - Traditional methods have struggled with motion blur and geometric accuracy in dynamic driving scenes, leading to the development of DriveSplat, which utilizes a decoupled approach for high-quality scene reconstruction [2][6][7]. Group 2: Methodology - DriveSplat employs a neural Gaussian representation with a decoupled strategy for dynamic and static elements, enhancing the representation of close-range details through a partitioned voxel initialization scheme [2][8][14]. - The framework incorporates deformable neural Gaussians to model non-rigid dynamic participants, with parameters adjusted over time using a learnable deformation network [2][8][21]. - The method leverages depth and normal priors from pre-trained models to improve geometric accuracy during the reconstruction process [2][23][41]. Group 3: Performance Evaluation - DriveSplat was evaluated on the Waymo and KITTI datasets, demonstrating superior performance in both scene reconstruction and novel view synthesis compared to existing methods [28][31]. - In the Waymo dataset, DriveSplat achieved a PSNR of 36.08, surpassing all baseline models, while also showing improvements in SSIM and LPIPS metrics [28][29]. - The method also outperformed competitors in the KITTI dataset, particularly in maintaining background detail and accurately rendering dynamic vehicles [31][32]. Group 4: Ablation Studies - Ablation studies indicated that the combination of SfM and LiDAR for point cloud initialization yielded the best rendering results, highlighting the importance of effective initialization methods [33][34]. - The background partition optimization module was shown to enhance performance, confirming its necessity in the reconstruction process [36]. - The introduction of a deformable module significantly improved the rendering quality of non-rigid participants, demonstrating the effectiveness of the dynamic optimization approach [39][40].