自动驾驶大语言模型（LLM） - filings, earnings calls, financial reports, news

自动驾驶大语言模型（LLM）

Search documents

理想TOP2· 2025-10-17 13:44

Core Viewpoint - The article emphasizes the advancements in autonomous driving technology by Li Auto, focusing on innovative solutions to enhance safety, efficiency, and sustainability in transportation [1]. Group 1: Autonomous Driving Technologies - The company is developing a large language model (LLM) to interpret complex driving scenarios, enabling smarter and quicker responses from autonomous vehicles [2]. - A world model project aims to simulate real driving environments for testing and improving autonomous driving algorithms under various conditions [3]. - The 3D geometric scene (3DGS) understanding project focuses on creating detailed 3D maps of urban environments to enhance the perception systems of autonomous vehicles for better navigation and decision-making [4]. - The company is pioneering an end-to-end neural network model that simplifies the entire processing flow from perception to execution in autonomous driving systems [5]. Group 2: Research and Development Projects - DriveVLM is a dual-system architecture combining end-to-end and vision-language models for autonomous driving [7]. - TOP3Cap is a dataset that describes autonomous driving street scenes in natural language, containing 850 outdoor scenes, over 64,300 objects, and 2.3 million textual descriptions [7]. - StreetGaussians presents an efficient method for creating realistic, dynamic urban street models for autonomous driving scenarios [8]. - DiVE is a model based on the Diffusion Transformer architecture that generates videos consistent in time and multiple perspectives, matching given bird's-eye view layouts [8]. - GaussianAD utilizes sparse and comprehensive 3D Gaussian functions to represent and convey scene information, addressing the trade-off between information completeness and computational efficiency [8]. - 3DRealCar is a large-scale real-world 3D car dataset containing 2,500 cars scanned in 3D, with an average of 200 dense RGB-D views per car [8]. - DriveDreamer4D employs a video generation model as a data machine to create video data of vehicles executing complex maneuvers, supplementing real data [8]. - DrivingSphere combines 4D world modeling and video generation technologies to create a generative closed-loop simulation framework [8]. - StreetCrafter is a video diffusion model designed for street scene synthesis, utilizing precise laser radar data for pixel-level control [8]. - GeoDrive generates highly realistic, temporally consistent driving scene videos using 3D geometric information [10]. - LightVLA is the first adaptive visual token pruning framework that enhances the success rate and operational efficiency of robot VLA models [10].

面向城市场景的3D几何场景（3DGS）理解

面向城市场景的3D几何场景（3DGS）理解