自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

自动驾驶之心· 2025-08-28 03:22

Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving enthusiasts, aiming to facilitate knowledge sharing, technical discussions, and job opportunities in the field of autonomous driving and AI [1][13]. Group 1: Community Development - The "Autonomous Driving Heart Knowledge Planet" has grown to over 4,000 members, with a goal to reach nearly 10,000 in the next two years, providing a platform for exchange and technical sharing [1]. - The community offers a variety of resources, including video content, articles, learning paths, Q&A sessions, and job exchange opportunities [1][2]. Group 2: Learning Resources - The community has organized nearly 40 technical routes for members, covering various aspects of autonomous driving, including end-to-end learning, multi-modal models, and data annotation practices [2][5]. - A complete learning stack and roadmap for beginners have been prepared, making it suitable for those with no prior experience [7][9]. Group 3: Industry Insights - The community regularly invites industry leaders and experts to discuss trends in autonomous driving, technology directions, and production challenges [4][62]. - Members can engage in discussions about job opportunities, industry developments, and academic advancements, fostering a collaborative environment [59][64]. Group 4: Technical Focus Areas - Key focus areas include end-to-end autonomous driving, multi-sensor fusion, 3DGS, and NeRF technologies, with detailed resources and discussions available for each topic [31][32][33]. - The community also provides insights into the latest advancements in visual language models (VLM) and their applications in autonomous driving [35][36].

没有数据闭环的端到端只是半成品！九大议题权威解析~

自动驾驶之心· 2025-08-27 23:33

( 51Sim端到端数据闭环生态论坛 51Sim End-to-End Data Closed-Loop Ecosystem Forum e 2025.8.28 (周四) 下午 13:00-17:00 上海市浦东新区国展路1099号上海世博展览馆 B1层 2号会议厅扫描二维码免费参加 Opening Remarks O 致开场欢迎词端到端数据闭环 End-to-End Data Closed L 新挑战与新征程! 端到端时代数据驱动闭环全新升级张晓娜 51Sim 车辆事业部总经理 e Cod Frist 端到端时代仿真测试的思考与探索敬长城汽车智能驾驶高级专家鲍世强 51Sim CEO 张敏超东风汽车股份有限公司智驾系统开发主任北汽智驾仿真能力建设及实践总轩北京汽车研究总院有限公司智能驾驶部高级专家 Toward Re 面向合规的驾驶自动化仿真测试可信度评估技术研究陈硕中汽智能科技(天津)有限公司主任工程师轻型商用车自动驾驶系统设计及验证 8月28日不见不散！推荐阅读上海世博展览馆地址：南门：上海市浦东新区国展路1099号北门：上海市浦东新区博成路85 ...

超高性价比3D扫描仪！点云/视觉全场景重建，高精厘米级重建

自动驾驶之心· 2025-08-27 23:33

Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective 3D laser scanner designed for industrial and research applications, emphasizing its lightweight design, ease of use, and advanced features for real-time 3D scene reconstruction. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in 3D scene reconstruction using a multi-modal sensor fusion algorithm, capable of generating point clouds at a rate of 200,000 points per second and covering distances up to 70 meters [1][29]. - It supports scanning areas exceeding 200,000 square meters and can be equipped with a 3D Gaussian data collection module for high-fidelity scene restoration [1][50]. - The device is designed for easy operation with a one-button start feature, allowing users to quickly initiate scanning tasks without complex setups [5][42]. Group 2: Technical Specifications - The GeoScan S1 integrates various sensors, including RTK, IMU, and dual wide-angle cameras, and features a compact design with dimensions of 14.2cm x 9.5cm x 45cm and a weight of 1.3kg (excluding battery) [22][12]. - It operates on a power input of 13.8V - 24V with a power consumption of 25W, and has a battery capacity of 88.8Wh, providing approximately 3 to 4 hours of operational time [22][26]. - The system supports multiple data export formats, including PCD, LAS, and PLV, and runs on Ubuntu 20.04, compatible with ROS [22][42]. Group 3: Market Positioning - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan for the basic version [9][57]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, with over a hundred projects demonstrating its capabilities [9][38]. - The device is designed to facilitate unmanned operations and can be integrated with various platforms such as drones and robotic vehicles, enhancing its versatility in different operational environments [44][46].

自动驾驶之心· 2025-08-27 23:33

作者 | 跃来跃好来源 | 地平线开发者点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 2.1 传统端到端自动驾驶的不足传统的端到端自动驾驶通过感知（Perception）生成 3D 目标框（3D Boxes）；然后预测模块使用 3D 目标和地图预测运动轨迹；规划模块根据预测进行轨迹规划。 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文 01 引言 MindVLA 主要包括空间智能模块、语言智能模块、动作策略模块、强化学习模块，这些模块分别有以下功能：空间智能模块：输入为多模态传感器数据，使用 3D 编码器提取时空特征，然后将所有传感器与语义信息融合成统一的表征。语言智能模块：嵌入式部署的大语言模型 MindGP ，用于空间 + 语言的联合推理，支持语音指令和反馈，可能实现人车交互。动作策略模块：使用扩散模型生成车辆未来的行为轨迹，引入噪声来引导扩散过程以生成多样化的动作规划。强化学习模块：使用 World Model 模拟外部环境响应，评估行为后果；使用奖励模型（Reward Model）：提 ...

InternVL 3.5来了！上海AI Lab最新开源：硬刚 GPT-5 还把效率玩明白

自动驾驶之心· 2025-08-27 23:33

Core Viewpoint - Shanghai AI Lab has launched the open-source multimodal model InternVL 3.5, which significantly advances the performance of the InternVL series in terms of generality, reasoning ability, and inference efficiency compared to its predecessors [2]. Model Architecture - InternVL 3.5 consists of three core components: a dynamic high-resolution text tokenizer, an InternViT visual encoder, and a connector that integrates visual and language modalities [5]. - The model employs a two-stage training paradigm, including a large-scale pre-training phase and a multi-stage post-training phase [5][6]. Training Objectives - The pre-training phase utilizes a large-scale multimodal corpus to learn general visual-language representations, with a total of approximately 1.16 billion samples corresponding to about 250 billion tokens [7]. - The post-training strategy includes three stages: Supervised Fine-Tuning (SFT), Cascade Reinforcement Learning (Cascade RL), and Visual Consistency Learning (ViCO) [9]. Performance Metrics - InternVL 3.5 has shown superior performance across various benchmarks, achieving notable scores in tasks such as MMStar, MMVet, and MMBench V1.1 [14]. - The model's performance is competitive with top commercial models like GPT-5, demonstrating significant improvements in multimodal reasoning and mathematical tasks [14][15]. Testing and Deployment - The model incorporates a test-time scaling method to enhance reasoning capabilities, particularly for complex tasks requiring multi-step reasoning [11]. - The Decoupled Vision-Language Deployment (DvD) framework optimizes hardware costs and facilitates seamless integration of new modules without modifying the language server deployment [12].

死磕技术的自动驾驶全栈学习社区，近40+方向技术路线~

自动驾驶之心· 2025-08-27 01:26

Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving enthusiasts, aiming to connect learners and professionals in the field, providing resources, networking opportunities, and industry insights. Group 1: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, serving as a hub for communication and technical sharing [1][12] - The community offers a variety of resources including video content, articles, learning paths, Q&A, and job exchange opportunities [1][2] - Nearly 40 technical routes have been organized within the community, catering to various interests such as industry applications and the latest benchmarks [2][5] Group 2: Learning and Development - The community provides structured learning paths for beginners, including full-stack courses suitable for those with no prior experience [7][9] - Members can access detailed information on end-to-end autonomous driving, multi-modal models, and various data sets for training and fine-tuning [3][26] - Regular discussions with industry leaders are held to explore trends, technological directions, and production challenges in autonomous driving [4][58] Group 3: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [9][11] - Members are encouraged to engage in discussions about career choices and research directions, receiving guidance from experienced professionals [55][60] - The platform aims to connect members with job openings and industry opportunities, enhancing their career prospects in the autonomous driving sector [1][62]

自动驾驶VLA技术交流群成立了（数据/模型/部署等方向）

自动驾驶之心· 2025-08-26 23:32

自动驾驶之心大模型VLA技术交流群成立了，欢迎大家加入一起交流VLA相关的内容：包括VLA数据集制作、一段式VLA、分层VLA、基于大模型的端到端方案、基于VLM+DP的方案、量产落地、求职等内容。感兴趣的同学欢迎添加小助理微信进群：AIDriver005，备注：昵称+VLA加群。 ...

自动驾驶VLA技术

理想汽车MoE+Sparse Attention高效结构解析

自动驾驶之心· 2025-08-26 23:32

Core Viewpoint - The article discusses the advanced technologies used in Li Auto's autonomous driving solutions, specifically focusing on the "MoE + Sparse Attention" efficient structure that enhances the performance and efficiency of large models in 3D spatial understanding and reasoning [3][6]. Group 1: Introduction to Technologies - The article introduces a series of posts that delve deeper into the advanced technologies involved in Li Auto's VLM and VLA solutions, which were only briefly discussed in previous articles [3]. - The focus is on the "MoE + Sparse Attention" structure, which is crucial for improving the efficiency and performance of large models [3][6]. Group 2: Sparse Attention - Sparse Attention limits the complexity of the attention mechanism by focusing only on key input parts, rather than computing globally, which is particularly beneficial in 3D scenarios [6][10]. - The structure combines local attention and strided attention to create a sparse yet effective attention mechanism, ensuring that each token can quickly propagate information while maintaining local modeling capabilities [10][11]. Group 3: MoE (Mixture of Experts) - MoE architecture divides computations into multiple expert sub-networks, allowing only a subset of experts to be activated for each input, thus enhancing computational efficiency without significantly increasing inference costs [22][24]. - The article outlines the core components of MoE, including the Gate module for selecting experts, the Experts module as independent networks, and the Dispatcher for optimizing computation [24][25]. Group 4: Implementation and Communication - The article provides insights into the implementation of MoE using DeepSpeed, highlighting its flexibility and efficiency in handling large models [27][29]. - It discusses the communication mechanisms required for efficient data distribution across multiple GPUs, emphasizing the importance of the all-to-all communication strategy in distributed training [34][37].

MoE + Sparse Attention

MoE + Sparse Attention

一文尽览！2025年多篇VLA与RL融合的突破方向

自动驾驶之心· 2025-08-26 23:32

Core Viewpoint - The article discusses a significant revolution in the field of robotic embodiment intelligence, focusing on the integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) to address core challenges in real-world robotic decision-making and task execution [2][58]. Summary by Sections GRAPE: Generalizing Robot Policy via Preference Alignment - The GRAPE framework enhances VLA model generalization and adaptability by aligning trajectories, decomposing tasks, and modeling preferences with flexible spatiotemporal constraints [5][6]. - GRAPE shows a 51.79% increase in success rates for seen tasks and a 58.20% increase for unseen tasks, while also reducing collision rates by 37.44% under safety objectives [8][9]. VLA-RL: Towards Masterful and General Robotic Manipulation - The VLA-RL framework addresses the failure of VLA models in out-of-distribution scenarios by utilizing trajectory-level RL expressions and fine-tuning reward models to handle sparse rewards [11][13]. - VLA-RL significantly improves performance on 40 challenging robotic tasks, demonstrating the potential for early reasoning expansion in robotic applications [15]. ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations - The ReWiND framework allows for task adaptation using pre-trained language-based reward functions, eliminating the need for new demonstrations for unseen tasks [18][19]. - ReWiND exhibits a 2.4 times improvement in reward generalization and a 5 times increase in new task adaptation efficiency compared to baseline methods [21]. ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy - ConRFT employs a two-phase reinforcement fine-tuning approach to stabilize VLA model performance during supervised learning [24][26]. - The method achieves a 96.3% success rate across eight practical tasks, improving performance by 144% compared to previous supervised learning methods [29]. RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning - RLDG enhances generalist policies by generating high-quality training data through reinforcement learning, addressing performance and generalization issues [33][34]. - The method shows a 40% increase in success rates for precise operation tasks, demonstrating improved adaptability to new tasks [39]. TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization - TGRPO introduces online reinforcement learning to VLA models, enhancing robustness and efficiency in policy learning [40][42]. - The method outperforms various baseline approaches in ten operational tasks, validating its effectiveness in improving VLA model adaptability [44]. Improving Vision-Language-Action Model with Online Reinforcement Learning - The iRe-VLAd framework optimizes VLA models through iterative reinforcement and supervised learning, addressing stability and computational challenges [45][47]. - The framework demonstrates effective performance improvements in interactive scenarios, providing a viable path for optimizing large VLA models [51]. Interactive Post-Training for Vision-Language-Action Models - RIPT-VLA offers a scalable, reinforcement learning-based interactive post-training approach to enhance VLA models in low-data environments [52][53]. - The method achieves a 97% success rate with minimal supervision, showcasing its robustness and adaptability across various tasks [57]. Conclusion - The eight studies represent a significant advancement in robotic intelligence, focusing on overcoming industry challenges such as strategy generalization and dynamic environment adaptation, with practical applications in home tasks, industrial assembly, and robotic manipulation [58].

视觉-语言-动作（VLA）模型与强化学习（RL）融合

视觉-语言-动作（VLA）模型与强化学习（RL）融合

超越OmniRe！中科院DriveSplat：几何增强的神经高斯驾驶场景重建新SOTA

自动驾驶之心· 2025-08-26 23:32

Core Viewpoint - The article discusses the introduction of DriveSplat, a new method for 3D reconstruction of driving scenes that significantly enhances the accuracy of both static and dynamic elements, achieving state-of-the-art performance in novel view synthesis tasks on two autonomous driving datasets [2][41]. Group 1: Background and Motivation - Realistic closed-loop simulation of driving scenes has become a major research focus in both academia and industry, addressing factors such as fast-moving vehicles and dynamic pedestrians [2][5]. - Traditional methods have struggled with motion blur and geometric accuracy in dynamic driving scenes, leading to the development of DriveSplat, which utilizes a decoupled approach for high-quality scene reconstruction [2][6][7]. Group 2: Methodology - DriveSplat employs a neural Gaussian representation with a decoupled strategy for dynamic and static elements, enhancing the representation of close-range details through a partitioned voxel initialization scheme [2][8][14]. - The framework incorporates deformable neural Gaussians to model non-rigid dynamic participants, with parameters adjusted over time using a learnable deformation network [2][8][21]. - The method leverages depth and normal priors from pre-trained models to improve geometric accuracy during the reconstruction process [2][23][41]. Group 3: Performance Evaluation - DriveSplat was evaluated on the Waymo and KITTI datasets, demonstrating superior performance in both scene reconstruction and novel view synthesis compared to existing methods [28][31]. - In the Waymo dataset, DriveSplat achieved a PSNR of 36.08, surpassing all baseline models, while also showing improvements in SSIM and LPIPS metrics [28][29]. - The method also outperformed competitors in the KITTI dataset, particularly in maintaining background detail and accurately rendering dynamic vehicles [31][32]. Group 4: Ablation Studies - Ablation studies indicated that the combination of SfM and LiDAR for point cloud initialization yielded the best rendering results, highlighting the importance of effective initialization methods [33][34]. - The background partition optimization module was shown to enhance performance, confirming its necessity in the reconstruction process [36]. - The introduction of a deformable module significantly improved the rendering quality of non-rigid participants, demonstrating the effectiveness of the dynamic optimization approach [39][40].