Workflow
Groq LPU芯片
icon
Search documents
ram在AI推理中拓展应用,堆叠方案可助力容量扩充
Orient Securities· 2026-03-07 07:59
Investment Rating - The report maintains a "Positive" investment rating for the electronic industry, indicating a favorable outlook for the sector [5]. Core Insights - SRAM is expanding its applications in AI inference, with stacking solutions aiding in capacity expansion. This presents significant investment opportunities in related companies [3][8]. - The report highlights the potential of SRAM architecture in AI inference, emphasizing its high access speed and low latency, which are critical for data requiring quick access [7]. - The industry is witnessing a trend towards 3D stacking solutions for SRAM, which can enhance density and overcome traditional capacity limitations [7]. Summary by Sections Investment Recommendations and Targets - Key companies to watch include: - Zhaoyi Innovation (兆易创新) and Beijing Junzheng (北京君正) for customized storage solutions [3][8]. - Hengshuo Co., Ltd. (恒烁股份) for SRAM-based digital computing solutions [3][8]. - Changdian Technology (长电科技) and Tongfu Microelectronics (通富微电) for advanced packaging [3][8]. - Companies like Deep South Circuit (深南电路) and Huitian Technology (沪电股份) are expected to benefit from NVIDIA's new chip solutions [3][8]. - Upstream PCB companies such as Shengyi Technology (生益科技) and Nanya Technology (南亚新材) are also highlighted [3][8]. Industry Developments - NVIDIA is set to unveil new AI inference chip solutions at the GTC 2026 conference, which could drive further SRAM applications [7]. - The report notes that SRAM's architecture is gaining recognition for its potential in AI inference, particularly for small parameter models and intermediate results [7]. - The 3D stacking technology, such as AMD's 3D V-Cache, is noted for its ability to significantly increase SRAM cache capacity [7][15].
广发证券:SRAM提升AI推理速度 相关架构进入主流大厂视野
Zhi Tong Cai Jing· 2026-02-27 07:35
Core Insights - SRAM significantly reduces latency and jitter for weight and activation data in large model applications, improving Time-to-First-Token and tail latency performance [1][2] - Companies like Groq and Cerebras have launched SRAM-based AI chips, marking SRAM architecture's entry into the mainstream [1][4] SRAM as On-Chip High-Bandwidth Storage Layer - SRAM (Static Random Access Memory) is integrated near CPU and GPU cores, offering nanosecond-level access latency and highly deterministic bandwidth characteristics, although it has a smaller capacity and higher cost compared to HBM, DRAM, and SSD [1] Performance Enhancements with SRAM - Groq's LPU chip integrates approximately 230MB of on-chip SRAM with a storage bandwidth of 80TB/s, significantly outperforming external HBM memory bandwidth of about 8TB/s [2] - In independent benchmark tests, Groq's LPU chip maintains a stable inference speed of 275-276 tokens/s across different context lengths, outperforming other inference platforms [2] Cerebras' Advancements - Cerebras' WSE-3 chip integrates 44GB of SRAM with an on-chip storage bandwidth of 21PB/s, achieving output speeds of over 3000 tokens/s for OpenAI's GPT-OSS 120B inference tasks, approximately 15 times faster than mainstream GPU cloud inference [3] - OpenAI plans to launch the first model running on Cerebras Systems AI accelerators, GPT-5.3-Codex-Spark, in February 2026, supporting over 1000 tokens/s code generation response speed [3] Market Developments - Nvidia invested $20 billion to acquire non-exclusive rights to Groq's intellectual property, including its language processing unit (LPU) and associated software libraries, and has integrated Groq's core engineering team [4] - Cerebras completed a $1 billion Series F financing round in February 2026, achieving a valuation of $23 billion, and signed a $10 billion contract with OpenAI to deploy up to 750 megawatts of custom AI chips [4] Investment Recommendations - The expansion of AI memory capabilities is expected to enhance model performance and accelerate the deployment of applications like AI Agents, suggesting a growing importance of upstream infrastructure in the industry [5]
AI的Memory时刻7:SRAM提升AI推理速度
GF SECURITIES· 2026-02-26 07:02
Investment Rating - The report provides a "Buy" rating for the industry, indicating an expectation of stock performance exceeding the market by more than 10% over the next 12 months [45]. Core Insights - SRAM (Static Random Access Memory) is identified as a high-bandwidth on-chip storage layer that can significantly enhance AI inference speed by reducing latency and jitter compared to external HBM (High Bandwidth Memory) [3][11]. - The architecture of SRAM is gaining mainstream attention, with significant investments and partnerships, such as Nvidia's $20 billion acquisition of Groq's intellectual property and OpenAI's $10 billion contract with Cerebras [3][32]. - The report emphasizes the growing importance of AI memory-related upstream infrastructure, suggesting that investors should focus on key beneficiaries within the industry chain [3][39]. Summary by Sections SRAM as a High-Bandwidth Storage Layer - SRAM is positioned as an essential component in the multi-tier storage architecture, providing high bandwidth but with limited capacity and higher costs [3][11]. SRAM Enhancing AI Inference Speed - SRAM can improve AI inference speed, with examples such as Groq's LPU chip achieving a bandwidth of 80 TB/s and maintaining stable inference speeds of 275-276 tokens/s, outperforming other platforms [3][15][21]. - Cerebras' WSE-3 chip integrates 44GB of SRAM, achieving over 3000 tokens/s in inference tasks, significantly faster than mainstream GPU cloud inference [3][23][39]. SRAM Architecture Gaining Mainstream Attention - The report notes that major companies are investing in SRAM technology, highlighting Groq's partnership with Nvidia and Cerebras' funding round that values the company at $23 billion [3][32][39]. Investment Recommendations - The report suggests that the ongoing expansion of AI memory capabilities will enhance model performance and accelerate the deployment of AI applications, recommending a focus on core beneficiaries in the industry chain [3][39].
腾讯研究院AI速递 20251226
腾讯研究院· 2025-12-25 16:57
Group 1 - Nvidia has reached a non-exclusive licensing agreement with AI chip startup Groq, reportedly worth $20 billion, acquiring Groq's founder Jonathan Ross and engineering team [1] - Groq focuses on LPU chips for inference, achieving an output speed of 500 tokens per second per card, which is ten times faster than Nvidia's GPUs, utilizing a temporal instruction set architecture to mitigate HBM shortages and reduce costs [1] - This transaction represents a "technology licensing + talent acquisition" model, allowing Groq to continue its cloud business independently while Nvidia aims to enhance its inference computing capabilities targeting the Google TPU market [1] Group 2 - Tsinghua TSAIL Laboratory and Shengshu Technology have jointly open-sourced the TurboDiffusion video generation acceleration framework, reducing the processing time of a 1.3B-480P model on a single RTX 5090 from 184 seconds to 1.9 seconds, achieving a 97-fold acceleration [2] - The framework integrates four core technologies: SageAttention2++ quantization, SLA sparse linear attention, rCM step distillation, and W8A8 quantization, decreasing end-to-end latency from 900 seconds to 8 seconds [2] - SageAttention has been successfully integrated into NVIDIA TensorRT and deployed on platforms such as Huawei Ascend and Moole Technology, with major companies like Tencent, ByteDance, and Alibaba already applying it [2] Group 3 - Shanghai Municipal Planning and Resources Bureau and SenseTime have launched the first 600 billion parameter foundational model in the national planning and resources field, named "Yunyu Xingkong," which can answer questions, adjust maps, perform statistics, recognize images, and generate reports [3] - The model is trained on the Kunyu Jinglue corpus and is integrated with the government intranet's professional version and core business systems, achieving a 98% accuracy rate for specialized terms and a 95% approval rate for human Q&A [3] - It employs a "1+6" (base + vertical) model system and an intelligent scheduling engine, supporting natural language calls for 2D and 3D spatial data, exploring a new paradigm for data productization and service-oriented government models [3] Group 4 - Tencent Cloud and Anhui Yilu Weixing have launched the first AI assistant in the ETC field, named "Assistant Agent," based on Tencent's Mix Yuan model, which has served over one million users since its internal testing began in April [4] - The assistant integrates multimodal interaction technology, supporting both text and voice input, achieving a 95% accuracy rate in Q&A and a 90% problem-solving rate, capable of handling complex requests such as device inquiries, traffic record checks, and invoicing [4] - It deploys 105 state monitoring algorithms to collect real-time device operation data, enabling voice interaction and key status reporting for a "service find person" capability, allowing users to control devices via voice commands [4] Group 5 - Dexmal has proposed the GeoVLA framework, utilizing a dual-stream architecture to retain VLM semantic understanding while endowing robots with 3D geometric perception capabilities through point cloud embedding networks and spatial awareness action experts [6] - In the LIBERO-90 long-range multi-task test, it achieved a 97.7% success rate, surpassing OpenVLA-OFT, and reached an average success rate of 77% in ManiSkill2, with an overall average of 86.3% in real-world tasks [6] - It demonstrated outstanding performance in out-of-distribution scene robustness tests, maintaining a 60% success rate with varying basket heights and a 70% success rate with a 45° viewpoint shift, proving its understanding of true 3D spatial structures [6] Group 6 - The SciMaster team, composed of Shanghai Jiao Tong University's TSAIL Laboratory, Shanghai Algorithm Innovation Research Institute, and DeepSense Technology, has launched ML-Master 2.0, achieving a 56.44% medal rate in the MLE-bench, topping the leaderboard [7] - This system is designed for real machine learning engineering, introducing a hierarchical cognitive caching mechanism that models context as Experience, Knowledge, and Wisdom [7] - It employs a "generate-validate" protocol to achieve ultra-long-range autonomous capabilities, with applications already in theoretical computational physics and embodied intelligence, currently open for Waiting List applications via the SciMaster platform [7] Group 7 - Jim Fan, head of embodied intelligence at Nvidia, stated that Tesla's FSD v14 is the first AI to pass the physical Turing test, with Elon Musk noting that "perception is maturing," and the software has been launched in seven countries including the US [9] - Tesla has established 14 technical barriers, including a sensor freezing scheme for 4-6 years to accumulate data, an instant value judgment engine for intelligent data filtering, and a Neural Codec for processing raw Bayer data [9] - The end-to-end transformer facilitates the transition from photon input to motor torque output, with hardware-in-loop quantization training conducted on the Cortex supercomputer's vehicle chip, updating 12 versions within 77 days, although issues remain with lane switching and lane change decisions [9]