理想提出首个包含自车和他车轨迹的世界模型

Core Viewpoint - The article discusses the development of a driving world model by Li Auto that integrates both ego and other vehicle trajectories, enabling more realistic simulations of driving scenarios and enhancing the training of their VLA (Vehicle Learning Algorithm) through reinforcement learning [1][6]. Group 1: Model Development - The driving world model proposed by Li Auto addresses three main deficiencies of previous models: lack of interactivity, feature distribution mismatch, and spatial mapping difficulties [6]. - The new model, EOT-WM, projects trajectory points into an image coordinate system, allowing for the generation of trajectory videos that unify visual modalities [6][8]. - A spatiotemporal variational autoencoder (STVAE) is employed to encode scene and trajectory videos, achieving aligned feature spaces for effective control [7]. Group 2: Technical Innovations - The model introduces a diffusion Transformer (TiDiT) that integrates motion guidance from trajectory variables into video latent variables for improved denoising of noisy video representations [9]. - A new metric based on the similarity of control latent variables is proposed to evaluate the controllability of predicted trajectories against true trajectory variables [7][9]. Group 3: Contributions - The model is the first to include both ego and other vehicle trajectories, allowing for more realistic simulations of interactions between the ego vehicle and driving scenarios [8]. - It represents trajectories as videos and aligns each trajectory with corresponding vehicles in a unified visual space [9].