Workflow
大模型平权
icon
Search documents
长城基金廖瀚博:积极寻找产业变化
Xin Lang Ji Jin· 2025-05-20 05:31
Group 1 - The A-share market experienced a volatile upward trend in Q1 2025, with the technology and manufacturing sectors leading the gains, particularly in humanoid robots and domestic computing power [1] - The core logic behind the technology sector's rise is driven by technological advancements, such as the emergence of DeepSeek, which enables equitable access to large models, and increased capital expenditure by domestic cloud providers, boosting the domestic computing power, especially in AIDC-related industries [1] - A foreign automotive company announced plans to mass-produce humanoid robots starting in 2025, marking a significant development year for the domestic robotics industry [1] Group 2 - After a significant rise over one quarter, the stock market capitalization of hot sectors reflects optimistic expectations for future industry development, although industry growth may take time and could be fraught with challenges [1] - Future stock prices of popular companies are expected to revert to fundamentals, with potential differentiation in the next round of market increases [1] - The company emphasizes the importance of embracing the era and seeking investment opportunities amid industrial changes, maintaining a strategy of actively looking for changes in the industry to invest in advantageous assets at reasonable prices [1]
中金 | AI进化论(2):模型+工程创新持续唤醒算力,DeepSeek撬动推理需求蓝海
中金点睛· 2025-02-27 23:34
中金研究 在本系列报告的第一篇中,我们深度讨论了DeepSeek(以下简称DS)技术创新对训练硬件的需求变化。除了训练以外,DS团队在最新一系列的开源成 果发布中针对推理任务也做出了双重维度的创新:一方面通过模型优化降低硬件资源占用,另一方面通过硬件工程化优化以发挥硬件最大效能。 点击小程序查看报告原文 Abstract 摘要 传统的Transformer模型通常采用多头注意力机制(Multi-Head-Attention, MHA),但在生成过程中,随着前置序列的长度变长,需要读取的KV cache也将 越来越大,数据的传输成本增加,KV缓存会限制推理效率。减少KV缓存的策略包括MQA和GQA等,它们所需的KV缓存规模较小,但性能却无法与 MHA相比。 图表1:MHA、GQA、MQA、MLA 架构对比 模型创新:借助MLA、NSA等技术优化加速推理。 在上一篇聚焦训练任务的报告中,我们重点解读了DS大语言模型中前馈网络(FFN)架构部分由稠密 演化到稀疏(MoE,专家模型)产生的影响,同时,DS在注意力机制(Attention)部分也做出了创新。针对传统Attention部分需要计算所有词对之间关联 的特性 ...