Workflow
DriveAction
icon
Search documents
快速结构化深度了解理想AI/自动驾驶/VLA手册
理想TOP2· 2025-10-10 11:19
Core Insights - The article discusses the evolution of Li Xiang's vision for Li Auto, emphasizing the transition from a traditional automotive company to an artificial intelligence (AI) company, driven by the belief in the transformative potential of AI and autonomous driving [1][2]. Motivation - Li Xiang considers founding Autohome as his biggest mistake, aiming for a venture at least ten times larger than it [1]. - The belief in the feasibility of autonomous driving and the industry's transformative phase motivated the establishment of Li Auto [1]. Timeline of Developments - In September 2022, Li Auto internally defined itself as an AI company [2]. - On January 28, 2023, Li Xiang officially announced the company's identity as an AI company [2]. - By March 2023, discussions around AI began, although initial understanding of concepts like pretraining and finetuning was limited [2]. - By December 2024, Li Xiang articulated five key judgments regarding AI's role and potential, emphasizing the importance of foundational models [2][3]. Key Judgments - Judgment 1: Li Xiang believes in OpenAI's five stages of AI, asserting that AI will democratize knowledge and capabilities [2]. - Judgment 2: The foundational model is seen as the operating system of the AI era, crucial for developing super products [2]. - Judgment 3: Current efforts are aimed at achieving Level 3 (L3) autonomous driving and securing a ticket to Level 4 (L4) [2][3]. - Judgment 4: The integration of large language models with autonomous driving will create a new entity termed VLA [3]. - Judgment 5: Li Auto aims to produce a car without a steering wheel within three years, contingent on the VLA foundational model and sufficient resources [3]. Technical Insights - The design and training of the VLA foundational model focus on 3D spatial understanding and reasoning capabilities [5][6]. - Sparse modeling techniques are employed to enhance efficiency without significantly increasing computational load [7]. - The model incorporates future frame prediction and dense depth prediction tasks to mimic human thought processes [8]. - The use of diffusion techniques allows for real-time trajectory generation and enhances the model's ability to predict complex traffic scenarios [10]. Reinforcement Learning - The company aims to surpass human driving capabilities through reinforcement learning, addressing previous limitations in model training and interaction environments [11]. Future Directions - Li Auto is actively developing various models and frameworks to enhance its autonomous driving capabilities, including the introduction of new methodologies for video generation and scene reconstruction [12][13].
理想认为VLA语言比视觉对动作准确率影响更大
理想TOP2· 2025-08-16 12:11
Core Viewpoint - The article discusses the release of DriveAction, a benchmark for evaluating Visual-Language-Action (VLA) models, emphasizing the need for both visual and language inputs to enhance action prediction accuracy [1][3]. Summary by Sections DriveAction Overview - DriveAction is the first action-driven benchmark specifically designed for VLA models, containing 16,185 question-answer pairs generated from 2,610 driving scenarios [3]. - The dataset is derived from real-world driving data collected from mass-produced assisted driving vehicles [3]. Model Performance Evaluation - The experiments indicate that the most advanced Visual-Language Models (VLMs) require guidance from both visual and language inputs for accurate action predictions. The average accuracy drops by 3.3% without visual input, 4.1% without language input, and 8.0% when both are absent [3][6]. - In comprehensive evaluation modes, all models achieved the highest accuracy in the full V-L-A mode, while the lowest accuracy was observed in the no-information mode (A) [6]. Specific Task Performance - Performance metrics for specific tasks such as navigation, efficiency, and dynamic/static tasks are provided, showing varying strengths among different models [8]. - For instance, GPT-4o scored 66.8 in navigation-related visual questions, 75.2 in language questions, and 78.2 in execution questions, highlighting the diverse capabilities of models [8]. Stability Analysis - Stability analysis was conducted by repeating each setting three times to calculate average values and standard deviations. GPT-4.1 mini and Gemini 2.5 Pro exhibited strong stability with standard deviations below 0.3 [9].
自动驾驶端到端VLA落地,算法如何设计?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the rapid advancements in end-to-end autonomous driving, particularly focusing on Vision-Language-Action (VLA) models and their applications in the industry [2][3]. Group 1: VLA Model Developments - The introduction of AutoVLA, a new VLA model that integrates reasoning and action generation for end-to-end autonomous driving, shows promising results in semantic reasoning and trajectory planning [3][4]. - ReCogDrive, another VLA model, addresses performance issues in rare and long-tail scenarios by utilizing a three-stage training framework that combines visual language models with diffusion planners [7][9]. - Impromptu VLA introduces a dataset aimed at improving VLA models' performance in unstructured extreme conditions, demonstrating significant performance improvements in established benchmarks [14][24]. Group 2: Experimental Results - AutoVLA achieved competitive performance metrics in various scenarios, with the best-of-N method reaching a PDMS score of 92.12, indicating its effectiveness in planning and execution [5]. - ReCogDrive set a new state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, showcasing its robustness and safety in driving trajectories [9][10]. - The OpenDriveVLA model demonstrated superior results in open-loop trajectory planning and driving-related question-answering tasks, outperforming previous methods on the nuScenes dataset [28][32]. Group 3: Industry Trends - The article highlights a trend among major automotive manufacturers, such as Li Auto, Xiaomi, and XPeng, to invest heavily in VLA model research and development, indicating a competitive landscape in autonomous driving technology [2][3]. - The integration of large language models (LLMs) with VLA frameworks is becoming a focal point for enhancing decision-making capabilities in autonomous vehicles, as seen in models like ORION and VLM-RL [33][39].