西交利物浦&港科最新！轨迹预测基座大模型综述

Core Insights - The article discusses the application of large language models (LLMs) and multimodal large language models (MLLMs) in the paradigm shift for autonomous driving trajectory prediction, enhancing the understanding of complex traffic scenarios to improve safety and efficiency [1][20]. Summary by Sections Introduction and Overview - The integration of LLMs into autonomous driving systems allows for a deeper understanding of traffic scenarios, transitioning from traditional methods to LFM-based approaches [1]. - Trajectory prediction is identified as a core technology in autonomous driving, utilizing historical data and contextual information to infer future movements of traffic participants [5]. Traditional Methods and Challenges - Traditional vehicle trajectory prediction methods include physics-based approaches (e.g., Kalman filters) and machine learning methods (e.g., Gaussian processes), which struggle with complex interactions [8]. - Deep learning methods improve long-term prediction accuracy but face challenges such as high computational demands and poor interpretability [9]. - Reinforcement learning methods excel in interactive scene modeling but are complex and unstable [9]. LLM-Based Vehicle Trajectory Prediction - LFM introduces a paradigm shift by discretizing continuous motion states into symbolic sequences, leveraging LLMs' semantic modeling capabilities [11]. - Key applications of LLMs include trajectory-language mapping, multimodal fusion, and constraint-based reasoning, enhancing interpretability and robustness in long-tail scenarios [11][13]. Evaluation Metrics and Datasets - The article categorizes datasets for pedestrian and vehicle trajectory prediction, highlighting the importance of datasets like Waymo and ETH/UCY for evaluating model performance [16]. - Evaluation metrics for vehicles include L2 distance and collision rates, while pedestrian metrics focus on minADE and minFDE [17]. Performance Comparison - A performance comparison of various models on the NuScenes dataset shows that LLM-based methods significantly reduce collision rates and improve long-term prediction accuracy [18]. Discussion and Future Directions - The widespread application of LFMs indicates a shift from local pattern matching to global semantic understanding, enhancing safety and compliance in trajectory generation [20]. - Future research should focus on developing low-latency inference techniques, constructing motion-oriented foundational models, and advancing world perception and causal reasoning models [21].