Seek .-中金-科技硬件：AI进化论（1）：DeepSeek推动“大模型平权”，是训练算力的拐点还是黑洞？

Investment Rating - The report rates the industry as "Outperform" with specific stock recommendations for Nvidia and Broadcom, indicating a positive outlook for AI hardware and infrastructure demand [4][7]. Core Insights - The report highlights the significant advancements made by DeepSeek in generative AI technology, particularly its V3 model, which achieves capabilities comparable to leading models like GPT-4 at only 1/10th the training cost. This innovation is seen as a response to the constraints on AI hardware procurement due to US-China trade tensions, suggesting a potential increase in demand for computational resources across the industry [4][5]. Summary by Sections Model Innovations - DeepSeek continues to utilize the MoE (Mixture of Experts) architecture, which reduces computational costs by activating only a subset of expert models during training. The V3 model has increased the number of experts to 256, optimizing resource usage [10][11]. - The introduction of FP8 precision training significantly lowers computational resource consumption compared to traditional FP16 methods, enhancing training efficiency [18][19]. - The MTP (Multi-token Prediction) method improves training efficiency by allowing the model to predict multiple tokens simultaneously, thus increasing data utilization and reducing overall training data requirements [24][28]. Hardware Engineering Innovations - The report discusses the importance of hardware engineering innovations in response to the growing demands of large models. The use of distributed parallel strategies, such as Expert Parallel (EP), allows for efficient deployment of expert models across multiple GPUs, minimizing communication overhead [35][38]. - The DualPipe strategy enhances the efficiency of data transmission during training by allowing simultaneous forward and backward computations, thereby reducing idle time for computing devices [44][47]. - The implementation of PTX code facilitates hardware optimization, enabling developers to maximize the efficiency of specific models under given hardware conditions [49][51]. Market Demand and Trends - The report anticipates robust growth in the AI hardware and infrastructure market driven by the "democratization of large models," suggesting that the demand for efficient computational resources will continue to rise [7][8]. - It emphasizes the need for customized chip architectures to support the evolving requirements of MoE models, indicating a shift in design priorities within the semiconductor industry [53].