中金 | AI十年展望（二十）：细数2024大模型底层变化，推理优化、工程为王

Core Viewpoints - The core focus of 2024 is on optimizing inference performance and engineering improvements, shifting from the rapid parameter expansion seen in 2023 [3][5] - Model iteration is increasingly driven by application deployment and edge-side needs, leading to a divergence in parameter sizes between cloud and edge models [3][5] Path Exploration: Q-STaR - Quiet-STaR, introduced by Stanford in March 2024, uses reinforcement learning to optimize explicit intermediate reasoning, mimicking human brain reasoning and improving generalization [1][5] - Based on Mistral 7B, Quiet-STaR significantly improves zero-shot accuracy and reduces perplexity in complex reasoning tasks [5][11] Path Breakthrough: Native End-to-End Multimodal Models - From Google Gemini to OpenAI GPT-4o, mainstream models have shifted from cross-modal to end-to-end multimodal architectures, reducing latency and preserving multimodal information [2][6] - Domestic companies like SenseTime and Minimax are also adopting native end-to-end multimodal approaches, with applications comparable to GPT-4o [6] Algorithm Innovation: DeepSeek - DeepSeek, backed by quantitative hedge fund Huanfang, introduced MLA and DeepSeekMoE algorithms, significantly reducing inference costs and triggering a price reduction wave in the industry [2][6] - DeepSeek-V2's API pricing is about 1% of GPT-4 Turbo, making AI applications more accessible [6][20] Inference Optimization: Apple's Edge-Side Deployment - Apple Intelligence integrates AI capabilities across apps, with models like AFM-On-device and LLM-in-a-flash optimizing edge-side inference performance [7][21] - Apple's collaboration with Meta on LazyLLM dynamic pruning improves prefill speed, enhancing user experience [25][26] Engineering Improvements: Mooncake and Synthetic Data - Mooncake separates prefill and decoding stages, improving cluster efficiency and KV-Cache reuse [26][27] - Synthetic data, generated through reinforcement learning and self-play, is key to overcoming the "data wall" and enhancing model performance [27][28][29] Industry Trends: Synthetic Data Adoption - Companies like Meta, NVIDIA, Zhipu, and SenseTime are actively exploring synthetic data to enhance model capabilities in tasks like coding, long-context understanding, and tool usage [30] - Synthetic data helps align models with human preferences and specific domain requirements, though risks like model collapse and data leakage remain [31]