Workflow
GALTraj
icon
Search documents
自动驾驶论文速递 | 扩散模型、轨迹预测、TopoLiDM、VLA等~
自动驾驶之心· 2025-08-05 03:09
Core Insights - The article discusses advancements in trajectory prediction using a generative active learning framework called GALTraj, which applies controllable diffusion models to address long-tail issues in data [1][2]. Group 1: GALTraj Framework - GALTraj is the first framework to apply generative active learning to trajectory prediction tasks, enhancing long-tail learning without modifying the model structure [2]. - The framework employs a tail-aware generation method that differentiates the diffusion guidance for tail, head, and related agents, producing realistic and diverse scenarios while preserving tail characteristics [2][3]. Group 2: Experimental Results - In experiments on WOMD and Argoverse2 datasets, GALTraj significantly improved long-tail sample prediction performance, reducing the long-tail metric FPR₅ by 47.6% (from 0.42 to 0.22) and overall prediction error minFDE₆ by 14.7% (from 0.654 to 0.558) [1][6]. - The results indicate that GALTraj outperforms traditional methods across various metrics, showcasing its effectiveness in enhancing prediction accuracy for rare scenarios [7][8]. Group 3: TopoLiDM Framework - The article also highlights the TopoLiDM framework developed by Shanghai Jiao Tong University and Twente University, which integrates topology-aware diffusion models for high-fidelity LiDAR point cloud generation [13][15]. - TopoLiDM achieved a 22.6% reduction in the Fréchet Range Image Distance (FRID) and a 9.2% reduction in Minimum Matching Distance (MMD) on the KITTI-360 dataset while maintaining a real-time generation speed of 1.68 samples per second [13][15]. Group 4: FastDriveVLA Framework - FastDriveVLA, developed by Peking University and Xiaopeng Motors, introduces a reconstruction-based visual token pruning framework that maintains 99.1% trajectory accuracy with a 50% pruning rate and reduces collision rates by 2.7% [21][22]. - The framework employs a novel adversarial foreground-background reconstruction strategy to enhance the identification of valuable tokens, achieving state-of-the-art performance on the nuScenes open-loop planning benchmark [27][28]. Group 5: PLA Framework - The article presents a unified Perception-Language-Action (PLA) framework proposed by TUM, which integrates multi-sensor fusion and GPT-4.1 enhanced visual-language-action reasoning for adaptive autonomous driving [34][35]. - The framework demonstrated a mean absolute error (MAE) of 0.39 m/s in speed prediction and an average displacement error (ADE) of 1.013 meters in trajectory tracking within urban intersection scenarios [42].