GF SECURITIES-广发证券：SRAM提升AI推理速度相关架构进入主流大厂视野

Core Insights - SRAM significantly reduces latency and jitter for weight and activation data in large model applications, improving Time-to-First-Token and tail latency performance [1][2] - Companies like Groq and Cerebras have launched SRAM-based AI chips, marking SRAM architecture's entry into the mainstream [1][4] SRAM as On-Chip High-Bandwidth Storage Layer - SRAM (Static Random Access Memory) is integrated near CPU and GPU cores, offering nanosecond-level access latency and highly deterministic bandwidth characteristics, although it has a smaller capacity and higher cost compared to HBM, DRAM, and SSD [1] Performance Enhancements with SRAM - Groq's LPU chip integrates approximately 230MB of on-chip SRAM with a storage bandwidth of 80TB/s, significantly outperforming external HBM memory bandwidth of about 8TB/s [2] - In independent benchmark tests, Groq's LPU chip maintains a stable inference speed of 275-276 tokens/s across different context lengths, outperforming other inference platforms [2] Cerebras' Advancements - Cerebras' WSE-3 chip integrates 44GB of SRAM with an on-chip storage bandwidth of 21PB/s, achieving output speeds of over 3000 tokens/s for OpenAI's GPT-OSS 120B inference tasks, approximately 15 times faster than mainstream GPU cloud inference [3] - OpenAI plans to launch the first model running on Cerebras Systems AI accelerators, GPT-5.3-Codex-Spark, in February 2026, supporting over 1000 tokens/s code generation response speed [3] Market Developments - Nvidia invested $20 billion to acquire non-exclusive rights to Groq's intellectual property, including its language processing unit (LPU) and associated software libraries, and has integrated Groq's core engineering team [4] - Cerebras completed a $1 billion Series F financing round in February 2026, achieving a valuation of $23 billion, and signed a $10 billion contract with OpenAI to deploy up to 750 megawatts of custom AI chips [4] Investment Recommendations - The expansion of AI memory capabilities is expected to enhance model performance and accelerate the deployment of applications like AI Agents, suggesting a growing importance of upstream infrastructure in the industry [5]