Workflow
评估
icon
Search documents
特斯拉Ashok ICCV'25讲FSD与QA|952字压缩版/完整图文/完整视频
理想TOP2· 2025-10-23 15:33
Core Viewpoint - Tesla is shifting to a single, large end-to-end neural network that directly generates control actions from pixel and sensor data, eliminating explicit perception modules [1][34]. Group 1: Reasons for Transition to End-to-End Neural Networks - Integrating human values (like driving smoothness and risk assessment) into code is extremely challenging [3]. - Poor interface definitions between traditional perception, prediction, and planning can lead to information loss [4]. - The end-to-end approach is easier to scale for handling long-tail problems in the real world [5]. - It allows for homogeneous computation with deterministic latency, which is crucial for real-time systems [6]. Group 2: Challenges in Learning "Pixel to Control" - The primary challenges include the curse of dimensionality, interpretability and safety guarantees, and evaluation [7][8][9]. - The input context can be extensive, with a 30-second window potentially reaching 2 billion tokens [10][49]. - Tesla leverages its vast fleet data to extract valuable corner case data through complex, trigger-based data collection methods [11][51][56]. Group 3: Solutions to Challenges - For the curse of dimensionality, Tesla refines its extensive driving data to ensure the right correlations are captured [51][56]. - Interpretability is addressed by prompting the end-to-end model to predict various auxiliary outputs for debugging and safety assurance [12][60]. - Evaluation challenges are tackled by creating a neural network-based world simulator that can generate consistent video streams from multiple cameras [19][79]. Group 4: Future Developments - The next step involves the Cyber Cab, a next-generation vehicle designed specifically for robotaxi services, utilizing the same neural network technology [25][83]. - The technology developed for autonomous driving is also being adapted for humanoid robots, such as Optimus [26][86].