MOE架构 - filings, earnings calls, financial reports, news

MOE架构

Search documents

自动驾驶之心· 2026-01-27 09:40

Core Viewpoint - Tesla's Full Self-Driving (FSD) is fundamentally a large model that utilizes a significant neural network architecture to achieve end-to-end driving capabilities, contrary to claims that it relies on numerous smaller models for various tasks [7][17]. Summary by Sections FSD Model Architecture - Tesla FSD is characterized as a large model, confirmed by Ashok at ICCV, which utilizes a massive neural network for computations from Photon In to Control Out [7][14]. - The architecture includes numerous model parameter files, which are primarily small task heads rather than independent models, indicating a more complex integration than previously assumed [6][10]. Parameter File Insights - The discovery of hundreds of neural network parameter files has led to skepticism about FSD being a large model; however, these files are largely associated with smaller tasks rather than the core end-to-end model [8][10]. - The parameter sizes for HW3 and HW4 show significant growth, with HW4's B core reaching 7.5GB, indicating a substantial increase in model complexity and capability [8][12]. Memory and Bandwidth Considerations - HW3's limited memory bandwidth of 68GB/s restricts the model size to approximately 1.8 billion parameters, while HW4's bandwidth of 384GB/s allows for a theoretical capacity of around 10 billion parameters [12][13]. - The use of a mixture of experts (MOE) architecture enables Tesla to optimize memory usage and enhance model performance without exceeding bandwidth limitations [13][16]. Technological Advancements - The assertion that Tesla's technology is outdated is challenged by the argument that significant engineering innovations contribute to advancements, similar to the development of reusable rockets [17]. - The integration of advanced engineering practices and innovative architectures positions Tesla as a leader in the autonomous driving sector, countering claims of technological inferiority [17].

BigBite思维随笔分享特斯拉FSD就是一个端到端大模型的视角

理想TOP2· 2026-01-24 15:11

Core Viewpoint - Tesla's Full Self-Driving (FSD) is characterized as an end-to-end large model, challenging the notion that it is merely a combination of nearly 200 small scene models [1][11]. Group 1: Model Architecture and Parameters - The B-core neural network parameters significantly exceed those of the A-core, with only 61 shared parameter files, indicating that the redundancy design between A and B cores has become impractical with the rapid expansion of the neural network scale in Tesla V12 [5]. - The discovery of many model parameters being parts of a large model, indicated by naming conventions like FSD E2E FACTORY PART X, suggests a distributed deployment strategy for model parameters across different chips, which is common in the era of large models [6]. - Tesla's HW3 has limited memory bandwidth of 68GB/s, theoretically allowing for a maximum of 1.8GB of model parameters to support a 36Hz output, while HW4, with a bandwidth of 384GB/s, could theoretically support around 10 billion parameters [7][8]. Group 2: Mixture of Experts (MoE) Architecture - The use of a Mixture of Experts (MoE) architecture allows Tesla to run large-scale end-to-end models at high frequencies on relatively older chips by activating only a subset of expert networks, thus optimizing memory bandwidth usage [8][10]. - Elon Musk and Ashok Elluswamy have indicated that the FSD employs MoE architecture, which supports the idea of localized parameters for different regions while maintaining a generalized approach [9][10]. Group 3: Technological Advancement - The assertion that FSD is a backward technology is dismissed, emphasizing that technological advancement is not solely defined by scientific discoveries but also by engineering innovations, as exemplified by Tesla's achievements in rocket technology and engineering [11].