BigBite思维随笔分享特斯拉FSD就是一个端到端大模型的视角

Core Viewpoint - Tesla's Full Self-Driving (FSD) is characterized as an end-to-end large model, challenging the notion that it is merely a combination of nearly 200 small scene models [1][11]. Group 1: Model Architecture and Parameters - The B-core neural network parameters significantly exceed those of the A-core, with only 61 shared parameter files, indicating that the redundancy design between A and B cores has become impractical with the rapid expansion of the neural network scale in Tesla V12 [5]. - The discovery of many model parameters being parts of a large model, indicated by naming conventions like FSD E2E FACTORY PART X, suggests a distributed deployment strategy for model parameters across different chips, which is common in the era of large models [6]. - Tesla's HW3 has limited memory bandwidth of 68GB/s, theoretically allowing for a maximum of 1.8GB of model parameters to support a 36Hz output, while HW4, with a bandwidth of 384GB/s, could theoretically support around 10 billion parameters [7][8]. Group 2: Mixture of Experts (MoE) Architecture - The use of a Mixture of Experts (MoE) architecture allows Tesla to run large-scale end-to-end models at high frequencies on relatively older chips by activating only a subset of expert networks, thus optimizing memory bandwidth usage [8][10]. - Elon Musk and Ashok Elluswamy have indicated that the FSD employs MoE architecture, which supports the idea of localized parameters for different regions while maintaining a generalized approach [9][10]. Group 3: Technological Advancement - The assertion that FSD is a backward technology is dismissed, emphasizing that technological advancement is not solely defined by scientific discoveries but also by engineering innovations, as exemplified by Tesla's achievements in rocket technology and engineering [11].