Workflow
计算机行业专题研究:MoE与思维链助力大模型技术路线破局
Guotai Junan Securities·2024-09-20 06:21

Industry Investment Rating - The report maintains an Overweight rating for the industry, consistent with the previous rating [2] Core Viewpoints - The Transformer architecture faces challenges due to high computational costs, which hinder further innovation in large model development and application [3] - The release of the o1 model and the maturation of the MoE (Mixture of Experts) framework are expected to break through these technical bottlenecks [3] - MoE optimizes the Transformer architecture by reducing computational demands while significantly enhancing model capabilities [4] - OpenAI o1 introduces a "chain of thought" reasoning model, enabling deeper, more logical, and systematic thinking, which is particularly effective in specialized fields [4] Investment Recommendations - The report recommends investing in companies such as iFlytek, ArcSoft, Wondershare, Foxit Software, Kingsoft Office, Digiwin Software, Unisplendour, and Inspur Information [4] - Beneficiary companies include Kunlun Tech and Runda Medical [4] Transformer Architecture Challenges - The Transformer architecture requires massive computational resources, with training costs potentially rising to $10 billion to $100 billion within three years [4] - The computational complexity of Transformer models increases quadratically with input sequence length, making scaling difficult [6] MoE Framework Advantages - MoE reduces computational costs by activating only a subset of experts during inference, significantly lowering the number of activated parameters [12] - MoE models like Mixtral 8x outperform Transformer models like Llama 2 70B with only 13B activated parameters, reducing computational load [12] - MoE frameworks are widely adopted by major players such as OpenAI, Google, and Microsoft, with domestic companies like DeepSeek also making significant progress [12] OpenAI o1 Model Innovations - OpenAI o1 introduces a "chain of thought" reasoning model, enabling the model to perform deep, logical, and systematic thinking before responding to complex queries [4] - The model excels in fields such as mathematics, programming, and scientific reasoning, achieving 83.3% accuracy in the International Mathematical Olympiad qualification test [37] - o1 leverages reinforcement learning to improve reasoning capabilities, marking a fundamental shift in AI learning paradigms [36] MoE Applications Across Industries - MoE has shown excellent performance in NLP, CV, speech recognition, and robotics, with applications in AI agents, medical diagnostics, and autonomous driving [24][26][27] - In robotics, MoE-based models like GeRM demonstrate lower parameter thresholds and higher performance, reducing computational costs [29] - MoE is also being applied in gaming and education, enabling more realistic NPC interactions and personalized learning experiences [30][31] Domestic and International MoE Adoption - Domestic companies like Alibaba, Tencent, and DeepSeek have adopted MoE frameworks, with DeepSeek-V2 showing strong performance in Chinese and English language tasks [34] - Internationally, GPT-4, Gemini 1.5 Pro, and Claude 3 have embraced MoE, with GPT-4 charging $30 per million tokens, significantly higher than other models [32][33] Future Outlook - MoE and o1 models are expected to drive the development of AGI (Artificial General Intelligence), with potential applications in scientific research, engineering, and financial analysis [44] - The high computational costs of o1 models, with $15 per million tokens for input and $60 per million tokens for output, remain a challenge [42]