FlowDrive：一个具备软硬约束的可解释端到端框架（上交&博世）

Core Insights - The article introduces FlowDrive, a novel end-to-end driving framework that integrates energy-based flow field representation, adaptive anchor trajectory optimization, and motion-decoupled trajectory generation to enhance safety and interpretability in autonomous driving [4][45]. Group 1: Introduction and Background - End-to-end autonomous driving has gained attention for its potential to simplify traditional modular pipelines and leverage large-scale data for joint learning of perception, prediction, and planning tasks [4]. - A mainstream research direction involves generating Bird's Eye View (BEV) representations from multi-view camera inputs, which provide structured spatial views beneficial for downstream planning tasks [4][6]. Group 2: FlowDrive Framework - FlowDrive introduces energy-based flow fields in the BEV space to explicitly model geometric constraints and rule-based semantics, enhancing the effectiveness of BEV representations [7][15]. - The framework includes a flow-aware anchor trajectory optimization module that aligns initial trajectories with safe and goal-oriented areas, improving spatial effectiveness and intention consistency [15][22]. - A task-decoupled diffusion planner separates high-level intention prediction from low-level trajectory denoising, allowing for targeted supervision and flow field conditional decoding [9][27]. Group 3: Experimental Results - Experiments on the NAVSIM v2 benchmark dataset demonstrate that FlowDrive achieves state-of-the-art performance, with an Extended Predictive Driver Model Score (EPDMS) of 86.3, surpassing previous benchmark methods [3][40]. - FlowDrive shows significant advantages in safety-related metrics such as Drivable Area Compliance (DAC) and Time to Collision (TTC), indicating superior adherence to driving constraints and hazard avoidance capabilities [40][41]. - The framework's performance is validated through ablation studies, showing that removing any core component leads to significant declines in overall performance [43][47]. Group 4: Technical Details - The flow field learning module encodes dense, physically interpretable spatial gradients to provide fine-grained guidance for trajectory planning [20][21]. - The perception module utilizes a Transformer-based architecture to effectively fuse multi-modal sensor inputs into a compact and semantically rich BEV representation [18][37]. - The training process involves a composite loss function that supervises trajectory planning, anchor trajectory optimization, flow field modeling, and auxiliary perception tasks [30][31][32][34].