DeepSeek技术溯源及前沿探索报告

Investment Rating - The report does not provide a specific investment rating for the industry Core Insights - The report discusses the evolution of large language models (LLMs) and highlights the significance of DeepSeek technology in bridging the gap between open-source and closed-source AI models, reducing the development lag from 6-12 months to 1-3 months [69] Summary by Sections Language Models - Language models aim to calculate the probability of a sequence of words, enabling machines to understand human language [6] - The report outlines the basic tasks of language models, including encoding and word embedding, which help in representing words in a way that captures their meanings [13][17] Transformer - The Transformer architecture introduced in 2017 revolutionized deep learning with its self-attention mechanism, allowing for parallel computation and better understanding of global context [32] - The report emphasizes the importance of the Transformer model as a foundational technology for large models, highlighting its ability to capture complex semantic relationships through multi-head attention [33] DeepSeek - DeepSeek technology is positioned as a significant advancement in AI, with its architecture allowing for efficient model training and inference, thus addressing the computational demands of large models [70] - The report details the stages of DeepSeek's development, including supervised fine-tuning and reinforcement learning, which enhance its reasoning capabilities [117][119] New Generation Agents - The report discusses the transition from generative models to reasoning models, indicating a shift in focus towards enhancing logical reasoning capabilities in AI systems [107] - It highlights the integration of LLMs with agent-based systems, where LLMs serve as the brain of agents, enabling them to perform complex tasks through planning and tool usage [133]