自动驾驶中有“纯血VLA"吗？盘点自动驾驶VLM到底能起到哪些作用~

Core Viewpoint - The article discusses the challenges and methodologies involved in developing datasets for autonomous driving, particularly focusing on the VLA (Visual Language Action) model and its applications in trajectory prediction and scene understanding [1]. Dataset Handling - Different datasets have varying numbers of cameras, and the VLM model can handle this by automatically processing different image token inputs without needing explicit camera counts [2] - The output trajectories are based on the vehicle's current coordinate system, with predictions given as relative (x, y) values rather than image coordinates, requiring additional camera parameters for mapping to images [6] - The VLA model's output format is generally adhered to, but occasional discrepancies occur, which are corrected through Python programming for format normalization [8][9] Trajectory Prediction - VLA trajectory prediction differs from traditional methods by incorporating scene understanding capabilities through QA training, enhancing the model's ability to predict trajectories of dynamic objects like vehicles and pedestrians [11] - The dataset construction faced challenges such as data quality issues and inconsistencies in coordinate formats, which were addressed through rigorous data cleaning and standardization processes [14][15] Data Alignment and Structure - Data alignment is achieved by converting various dataset formats into a unified relative displacement in the vehicle's coordinate system, organized in a QA format that includes trajectory prediction and dynamic object forecasting [18] - The input data format consists of images and trajectory points from the previous 1.5 seconds to predict future trajectory points over 5 seconds, adhering to the SANA standard [20] Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community focuses on cutting-edge technologies in autonomous driving, covering nearly 40 technical directions and fostering collaboration between industry and academia [22][24] - The community offers a comprehensive platform for learning, including video tutorials, Q&A sessions, and job opportunities in the autonomous driving sector [28][29]