FastDrive
Search documents
上交&卡尔动力FastDrive!结构化标签实现端到端大模型更快更强~
自动驾驶之心· 2025-06-23 11:34
Core Viewpoint - The integration of human-like reasoning capabilities into end-to-end autonomous driving systems is a cutting-edge research area, with a focus on vision-language models (VLMs) [1]. Group 1: Structured Dataset and Model - A structured dataset called NuScenes-S has been introduced, which focuses on key elements closely related to driving decisions, eliminating redundant information and improving reasoning efficiency [4][5]. - The FastDrive model, with 0.9 billion parameters, mimics human reasoning strategies and effectively aligns with end-to-end autonomous driving frameworks [4][5]. Group 2: Dataset Description - The NuScenes-S dataset provides a comprehensive view of driving scenarios, addressing issues often overlooked in existing datasets. It includes key elements such as weather, traffic conditions, driving areas, traffic lights, traffic signs, road conditions, lane markings, and time [7][8]. - The dataset construction involved annotating scene information using both GPT and human input, refining the results through comparison and optimization [9]. Group 3: FastDrive Algorithm Model - The FastDrive model follows the "ViT-Adapter-LLM" architecture, utilizing a Vision Transformer for visual feature extraction and a token-packing module to enhance inference speed [18][19]. - The model employs a large language model (LLM) to generate scene descriptions, identify key objects, predict future states, and make driving decisions in a reasoning chain manner [19]. Group 4: Experimental Results - Experiments conducted on the NuScenes-S dataset, which contains 102,000 question-answer pairs, demonstrated that FastDrive achieved competitive performance in scene understanding tasks [21]. - The performance metrics for FastDrive showed strong results in perception, prediction, and decision-making tasks, outperforming other models [25].