Workflow
AI记忆
icon
Search documents
广发证券:SRAM提升AI推理速度 相关架构进入主流大厂视野
Zhi Tong Cai Jing· 2026-02-27 07:35
广发证券发布研报称,在大模型应用中,相比依赖外置HBM,SRAM可显著降低权重与激活数据的访 延迟与抖动,从而改善Time-to-First-Token与尾时延表现。目前,Groq与Cerebras都相继推出基于 SRAMAI芯片。SRAM架构进入主流视野,根据Groq官网以及市场媒体报道,英伟达此前斥资200亿美 元获得Groq的知识产权的非独家授权;OpenAI与Cerebras签署100亿美元合同,部署多达750兆瓦的定制 AI芯片。 广发证券主要观点如下: SRAM是片上高带宽存储层 存储分级为SRAM、HBM、DRAM和SSD,其中SRAM(静态随机存取存储器)集成在CPU、GPU计算核 心附近的片上存储,具备纳秒级访问时延与高度确定性的带宽特性,带宽高但容量小、成本高。 SRAM可提升AI推理速度 根据Cerebras官网,其晶圆级引擎3(WSE-3)芯片集成44GB SRAM,片上存储带宽达21PB/s,在OpenAI GPTOSS120B推理任务中实现>3000tokens/s的输出速度,较主流GPU云推理快约15×。此外,2026年2 月,OpenAI推出首个运行在Cerebras Syst ...
广发证券:HBF在读为主应用优势显著 商业化进程加速
智通财经网· 2026-02-27 02:03
智通财经APP获悉,广发证券发布研报称,HBF在读为主应用优势显著,商业化进程加速。该技术有效 填补了HBM与传统固态硬盘之间的空白,为对容量和成本敏感的读取密集型应用提供理想的解决方 案。AI记忆持续扩展模型能力边界,AIAgent等应用加速落地。AI记忆相关上游基础设施价值量、重要 性将不断提升。建议关注产业链核心受益标的。 HBF商业化进程正加速 根据trendforce数据,2025年8月,Sandisk宣布与SK hynix合作推进HBF标准化生态建设,根据SK hynix 公众号,2026年2月双方宣布在OCP(开放计算项目)体系下,启动下一代HBF全球标准化进程。根据 trendforce数据,闪迪计划于2026年下半年提供HBF模块样品,并于2027年初推出首批集成HBF的AI推 理服务器;SK hynix在OCP 2025上将HBF纳入其AIN(AI-NAND)产品线中的AINB(Bandwidth)方向;根据 trendforce数据,三星电子于2025年开始对自家HBF产品的早期概念设计工作,显示主流存储厂商对该 技术路径的关注度持续提升。 风险提示 AI产业发展以及需求不及预期;AI服 ...
AI的Memory时刻7:SRAM提升AI推理速度
GF SECURITIES· 2026-02-26 07:02
Investment Rating - The report provides a "Buy" rating for the industry, indicating an expectation of stock performance exceeding the market by more than 10% over the next 12 months [45]. Core Insights - SRAM (Static Random Access Memory) is identified as a high-bandwidth on-chip storage layer that can significantly enhance AI inference speed by reducing latency and jitter compared to external HBM (High Bandwidth Memory) [3][11]. - The architecture of SRAM is gaining mainstream attention, with significant investments and partnerships, such as Nvidia's $20 billion acquisition of Groq's intellectual property and OpenAI's $10 billion contract with Cerebras [3][32]. - The report emphasizes the growing importance of AI memory-related upstream infrastructure, suggesting that investors should focus on key beneficiaries within the industry chain [3][39]. Summary by Sections SRAM as a High-Bandwidth Storage Layer - SRAM is positioned as an essential component in the multi-tier storage architecture, providing high bandwidth but with limited capacity and higher costs [3][11]. SRAM Enhancing AI Inference Speed - SRAM can improve AI inference speed, with examples such as Groq's LPU chip achieving a bandwidth of 80 TB/s and maintaining stable inference speeds of 275-276 tokens/s, outperforming other platforms [3][15][21]. - Cerebras' WSE-3 chip integrates 44GB of SRAM, achieving over 3000 tokens/s in inference tasks, significantly faster than mainstream GPU cloud inference [3][23][39]. SRAM Architecture Gaining Mainstream Attention - The report notes that major companies are investing in SRAM technology, highlighting Groq's partnership with Nvidia and Cerebras' funding round that values the company at $23 billion [3][32][39]. Investment Recommendations - The report suggests that the ongoing expansion of AI memory capabilities will enhance model performance and accelerate the deployment of AI applications, recommending a focus on core beneficiaries in the industry chain [3][39].
首个大规模记忆湖发布,AI Infra跑步进入“记忆”时代
量子位· 2026-02-05 04:10
田晏林 发自 凹非寺 量子位 | 公众号 QbitAI "Your brain is for having ideas, not holding them. " ——Tiago Forte《Building a Second Brain》 LLM是AI的"第一大脑",记忆平台是AI的"第二大脑"。 畅销书作者Tiago Forte在《构建第二大脑》中曾分享核心观点: "生物大脑只用于思考创造,而外部系统用于信息的可靠存储。" ——这对我们理解AI的"双脑"分工极富启示。 事实上,LLM就如同AI的"第一大脑(生物脑)",它擅长思考、推理与即时生成,而不擅长长期、精确地存储海量事实。 而记忆平台是AI的"第二大脑",它主要按需为LLM提供准确的"记忆"支撑,让LLM从记忆负担中解放,专注于更高层次的推理与创造,从而协 同产生更精准、个性化且可行动的价值。 两者结合,记忆平台负责"记住一切",LLM负责"思考一切"。 3.0 生产力时代(2025年至今):萃取"隐性知识",固化核心资产 行业焦点转向直接提升生产效率。关键一跃在于能否将员工的决策逻辑、经验权衡等隐性知识数字化、轨迹化。 这不再是简单问答,而是通过记 ...
郑友德:AI记忆引发的版权危机及其化解
3 6 Ke· 2026-02-04 00:41
Core Insights - The research from Stanford and Yale serves as a warning and roadmap for the AI industry, emphasizing the need for responsible, transparent, and sustainable development in the face of copyright challenges posed by generative AI (GenAI) [1][2]. Group 1: Technical Truths Revealed - A significant study revealed that major language models (LLMs) can reproduce copyrighted texts with over 95% accuracy, indicating a deep memory of training data [3][4]. - The study confirmed that all tested LLMs could extract long passages of copyrighted material, with Claude 3.7 showing a 95.8% extraction rate for specific works [5][6]. - The research highlighted the vulnerability of existing protective measures, as models like Gemini 2.5 Pro and Grok 3 could reproduce over 70% of copyrighted content without any circumvention [7][8]. Group 2: Industry Risk Orientation - The AI industry faces systemic financial risks, with significant debt accumulation among major players, potentially reaching $1.5 trillion in the coming years [9][10]. - The reliance on fragile legal foundations for "fair use" raises concerns about the sustainability of the AI industry's financial ecosystem, especially if courts determine that AI operations constitute illegal copying [9][10]. Group 3: Judicial Conflicts - There is a stark contrast in judicial interpretations between the UK and Germany regarding whether model learning constitutes copyright infringement, with the UK courts denying that models store copies, while German courts have ruled otherwise [10][11]. - The German court's ruling established that memory in AI models equates to illegal storage, directly challenging the UK perspective [12][13]. Group 4: Defense Strategies - AI developers are likely to rely on the "fair use" doctrine in the U.S. legal framework, arguing that their training practices are transformative [13][14]. - In the EU, the legal framework does not support open fair use but provides statutory exemptions for text and data mining (TDM), which may not cover the extensive memory capabilities of LLMs [15][16]. Group 5: Regulatory Safety Evaluations - The inherent memory characteristics of LLMs could lead to significant legal consequences, necessitating that AI developers take proactive measures to prevent access to copyrighted content [30][31]. - Current protective technologies are easily circumvented, raising questions about their effectiveness and the potential for models to act as illegal retrieval tools [30][31]. Group 6: Judicial Remedies and Consequences - If AI models are determined to contain copies of copyrighted works, companies may face severe penalties, including the destruction of infringing copies and the requirement to retrain models using authorized materials [34][35]. - The legal debate centers on whether models merely contain instructions to create copies or if they substantively include copyrighted works, with significant implications for the AI industry's financial stability [32][34]. Group 7: Crisis Mitigation Strategies - The AI industry must develop a comprehensive internal compliance system to address copyright risks, including stringent data sourcing and filtering mechanisms [40][41]. - Implementing a statutory licensing system and compensation mechanisms can help resolve the challenges posed by massive data requirements in GenAI [42][43].
广发证券:AI记忆上游基础设施价值量、重要性提升 建议关注产业链核心受益标的
智通财经网· 2026-02-03 06:05
智通财经APP获悉,广发证券发布研报称,AI的Memory时刻,AI记忆成为支撑上下文连续性、个性化 与历史信息复用的底层能力,持续扩展模型能力边界,有望促进AI Agent等应用加速落地。AI记忆的价 值正从"费用项"转变为"资产项",相关上游基础设施价值量、重要性将不断提升。建议关注产业链核心 受益标的。 广发证券主要观点如下: 英伟达推出AI推理上下文存储平台ICMS 随用户多轮会话与Agent持续运行带来的KVCache不断累积,系统对可长期留存并按需回填的分层 KVCache形成刚性需求,推动上下文从HBM外溢至DRAM、SSD等分层介质承接。为此,NVIDIA推出 上下文记忆存储架构ICMS,面向Agent与多轮推理场景提供"长期上下文记忆层",一方面承载更大规模 KVCache,另一方面以低延迟将历史KVCache回填到多GPU节点的多回合推理会话;其KV访问模式呈现 低TTFT约束下的高并发、高吞吐随机读取。 ICMS平台对SSD使用效果好 经济性与扩展性方面,SSD单位成本显著低于GPU内存,且可按TB、PB容量扩展,是长期上下文的天 然承载介质。可行性方面,根据《Context Memor ...
观点全追踪(2月第2期):晨会精选-20260203
GF SECURITIES· 2026-02-03 01:23
[Table_Page] 投资策略|点评报告 2026 年 2 月 3 日 证券研究报告 [Table_Title] 晨会精选 ——观点全追踪(2 月第 2 期) [Table_Summary] 报告摘要: bilulu@gf.com.cn 识别风险,发现价值 请务必阅读末页的免责声明 [分析师: Table_Author]郑恺 SAC 执证号:S0260515090004 SFC CE No. BUU989 021-38003559 zhengkai@gf.com.cn 分析师: 耿正 SAC 执证号:S0260520090002 021-38003660 gengzheng@gf.com.cn 请注意,耿正并非香港证券及期货事务监察委员会的注册 持牌人,不可在香港从事受监管活动。 [联系人: Table_Contacts] 毕露露 ⚫ 电子:AI 记忆为 Agent 的核心底层能力。Agent 时代 Memory 负责 跨轮次、跨任务的状态连续性,沉淀"我是谁"的个性画像,"从哪里 来"的交互历史及"要到哪里去"的目标与反馈闭环。Agent 通常可分 为四类记忆:工作记忆用于当前任务的临时信息存取与推理( ...
2026,进入AI记忆元年
3 6 Ke· 2026-01-27 10:28
Group 1 - The core finding indicates that the iteration cycle of SOTA models has been rapidly compressed to 35 days since mid-2023, with previous SOTA models potentially falling out of the Top 5 in just 5 months and out of the Top 10 in 7 months, suggesting a stagnation in breakthrough innovations despite ongoing technical advancements [1] - The emergence of vector database products like Milvus, Pinecone, and faiss in 2023 marks a significant shift in the AI memory landscape, leading to a proliferation of AI memory frameworks such as Letta (MemGPT), Mem0, MemU, and MemOS expected to emerge between 2024 and 2025 [2] - The integration of memory capabilities into models has sparked discussions in the industry, with Claude and Google announcing advancements in model memory, indicating a growing focus on memory-enhanced AI applications across various sectors [2] Group 2 - There are three common misconceptions about adding memory to large models, with the first being the belief that memory equates to RAG (Retrieval-Augmented Generation) and long context [3][4] - The overemphasis on RAG performance has led to a misunderstanding of its limitations, as it can only address about 60% of real user needs, highlighting the necessity for a comprehensive solution that includes dynamic memory capabilities [6][8] - The second misconception is that factual retrieval is paramount, while emotional intelligence is crucial for effectively addressing user needs, as demonstrated by a case where AI was required to handle emotional support in sensitive situations [11][13] Group 3 - The third misconception is the belief that the future of agents lies in standardization, while the reality is that non-standard solutions are essential for addressing the diverse needs of different industries [15][16] - Red Bear AI has developed a memory system that incorporates emotional weighting and collaborative capabilities among agents, allowing for tailored solutions that adapt to specific industry requirements [17][19] - As the industry transitions into 2026, memory capabilities are becoming the key differentiator among models and agents, marking a shift from a focus on scaling laws to a marathon-like approach centered on memory [22]
2026,进入AI记忆元年
36氪· 2026-01-27 10:16
让AI像人类一样记忆, 这家公司如何拿下AI竞赛的下半场门票。 前不久, LMArena.ai 对全球大模型的市场地位变化做了统计后,得到了一个有意思的发现: 自 2023 年年中起, SOTA 模型 的迭代周期被 快速 压缩至 35 天, 曾经的 SOTA 模型,只要 短短 5 个月就可能跌出 Top5 , 7 个月后连 Top10 的 门槛都摸不到。 但 SOTA 不断更新的背后,模型的确在进步,但曾经 ChatGPT 、 Deepseek 这样让人眼前一亮的新产品却越来越少,技术进步已经进入了不断小修小补 却始终难以突破的瓶颈期。 与逐渐偃旗息鼓的模型进化形成鲜明对比的,是过去两年多围绕 AI 记忆形成的你方唱罢我登场的热闹。 其中,最先一步出发的,是 2023 年先后涌现出的诸如 Milvus 、 Pinecone 、 faiss 为代表的向量数据库产品。 此后一年,建立在成熟的语义、知识图库以及关键词检索基础上, 2024 — 2025 年期间, Letta ( MemGPT )、 Mem0 、 MemU 、 MemOS 为代 表的各种 AI 记忆框架,如雨后春笋般冒出, GitHub 上各种 Me ...
AI的Memory时刻5:AINAND供需紧张,涨价仍有弹性
GF SECURITIES· 2026-01-26 09:50
[Table_Page] 行业专题研究|电子 2026 年 1 月 26 日 证券研究报告 [Table_Title] AI 的 Memory 时刻 5 | | | | AI NAND | 供需紧张,涨价仍有弹性 | | | --- | --- | --- | --- | --- | --- | | 分析师: [Tabl | 王亮 | 分析师: | 耿正 | 分析师: | 焦鼎 | | e_Author] | SAC 执证号:S0260519060001 | | SAC 执证号:S0260520090002 | | SAC 执证号:S0260522120003 | | | SFC CE.no: BFS478 | | | | | | | 021-38003658 | | 021-38003660 | | 021-38003658 | | | gfwangliang@gf.com.cn | | gengzheng@gf.com.cn | | jiaoding@gf.com.cn | | 分析师: | 张大伟 | | | | | | | SAC 执证号:S0260523050001 | | | | | | | 02 ...