Workflow
Cerebras
icon
Search documents
推理芯片的四种方案,David Patterson撰文
半导体行业观察· 2026-01-19 01:54
公众号记得加星标⭐️,第一时间看推送不会错过。 编者按 日前,由Xiaoyu Ma和David Patterson联合署名的文章《Challenges and Research Directions for Large Language Model Inference Hardware》正式发布。这篇文章被发布以后,引起了广 泛关注。文章中,作者围绕LLM推理芯片的挑战以及解决方案,给出了建议。 以下为文章正文: 大型语言模型 (LLM) 推理难度很高。底层 Transformer 模型的自回归解码阶段使得 LLM 推理与训 练有着本质区别。受近期人工智能趋势的影响,主要挑战在于内存和互连,而非计算能力。 为了应对这些挑战,我们重点介绍了四个架构研究方向:高带宽闪存,可提供 10 倍内存容量,带宽 堪比 HBM;近内存处理和 3D 内存逻辑堆叠,可实现高内存带宽;以及低延迟互连,可加速通信。 虽然我们的研究重点是数据中心人工智能,但我们也探讨了这些方案在移动设备上的应用。 引言 当一位作者于 1976 年开始其职业生涯时,计算机体系结构会议上约 40% 的论文来自业界。到 2025 年 ISCA 会议时,这一 ...
千问发布,AI开启办事时代
Soochow Securities· 2026-01-18 05:21
Key Insights - The report highlights that the global AI industry is making significant progress in enhancing AI computing power, diversifying application scenarios, and realizing monetization, indicating a shift from technological breakthroughs to scalable commercial value [2][3] - Major companies like OpenAI and TSMC are strengthening their full-stack capabilities through capital integration, with OpenAI signing a procurement agreement worth over $10 billion to build the world's largest high-speed AI inference cluster [3] - The report notes that AI applications are increasingly being integrated into retail and consumer services, with Google planning to develop Gemini as a virtual shopping assistant, allowing users to browse and purchase products within the chat interface [5][6] Industry Developments - The AI industry is witnessing multi-point breakthroughs in innovation, particularly in consumer-facing applications, as companies focus on practical implementations [2] - OpenAI's collaboration with Cerebras to deploy a 750 MW system aims to create a significant AI inference platform, emphasizing the importance of inference speed in the competitive landscape [3] - The report mentions advancements in AI models, particularly in healthcare and multi-modal generation, with companies like Google and Zhizhu making strides in open-source model iterations [4] Market Trends - The report indicates a surge in short-term capital inflow driven by profit-making effects, leading to a recent boom in AI application markets, aligning with previous bullish predictions [6] - Regulatory support is highlighted as a stabilizing factor for the market, with the China Securities Regulatory Commission emphasizing a steady approach to market operations [6] - The report suggests that the favorable conditions driving market strength remain unchanged, with expectations for a stable transition into the next phase of market activity [6] Recommended Stocks - The report recommends several stocks, including 文远知行-W (00800.HK) for its leadership in commercializing RoboX and 东土科技 (300353) for its potential benefits from the integration of industrial internet and AI [7][14]
AI周报|ChatGPT广告来了;台积电最新季度净利润创新高
Di Yi Cai Jing Zi Xun· 2026-01-18 00:59
Group 1: TSMC Financial Performance - TSMC reported record net profit of NT$505.7 billion (approximately US$16 billion) for Q4 2025, a year-on-year increase of 35%, marking the seventh consecutive quarter of double-digit growth [1] - The company's revenue for the quarter reached NT$1.046 trillion (approximately US$33.73 billion), reflecting a 20.5% year-on-year growth, with 77% of total revenue coming from advanced processes of 7nm and below [1] - TSMC's growth is significantly driven by strong AI demand, with expectations for Q1 2026 revenue projected between US$34.6 billion and US$35.8 billion [1] Group 2: OpenAI Advertising Initiative - OpenAI announced plans to test advertisements in ChatGPT for free and entry-level subscription users, while Plus, Pro, Business, and Enterprise subscribers will remain ad-free [2] - The initiative aims to diversify revenue streams amid pressures for sustainable growth, as previous monetization attempts have not yielded significant results [2] - User reactions to the ad integration have been mixed, with some expressing discomfort at the idea of advertisements in a conversational AI context [2] Group 3: Nvidia Copper Usage Controversy - Nvidia's blog initially claimed that a 1GW data center using traditional 54V DC power systems could require up to 500,000 tons of copper, a statement later corrected to 200 tons following scrutiny [3] - The initial claim had been leveraged to suggest that AI data centers would significantly increase global copper demand, but analysts believe this narrative may be overstated [3] - Goldman Sachs noted that the current copper market does not show signs of significant supply tightness, predicting a slight surplus by 2026 [3] Group 4: Apple and Google Collaboration - Apple announced a partnership with Google to utilize Google's Gemini model architecture for the next generation of Apple Foundation Models, which will support an upgrade to Siri [4] - Reports suggest Apple will pay Google approximately US$1 billion annually for technology licensing, indicating a strategic shift from potential collaboration with OpenAI [4] - This partnership raises concerns about the concentration of power among a few tech giants in the AI space, as highlighted by industry figures [4] Group 5: DeepSeek's New Research - DeepSeek published a new paper focusing on conditional memory modules for large models, proposing that this will be a core component of the next generation of sparse large models [5][6] - The research aims to optimize resource allocation by separating tasks between specialized modules, enhancing efficiency and performance [6] - DeepSeek is expected to release its flagship model, DeepSeek V4, in February, which reportedly surpasses competitors in programming capabilities [6] Group 6: Alibaba's Qianwen App Upgrade - Alibaba's Qianwen app has integrated various services from its ecosystem, including Taobao and Alipay, enhancing its functionality significantly [7] - The app has seen rapid user growth, surpassing 1 million monthly active users within two months of launch, indicating strong market reception [7] - The upgrade positions Qianwen as a competitive AI assistant, differentiating it from other AI tools in the market [7] Group 7: UBS on AI Bubble in China - UBS analysts believe the probability of an AI bubble forming in China is low compared to the US, citing the lack of excessive financing among leading model firms [8] - Chinese AI companies are reportedly more prudent in capital expenditure, with a total of approximately 400 billion yuan spent last year, significantly less than their US counterparts [8] - The report suggests that by 2026, the development paths of AI in China and the US will diverge, impacting foreign investment strategies [8] Group 8: US Tariffs on Semiconductor Imports - The US government announced a 25% tariff on certain imported semiconductors and related products, including Nvidia's AI chips [9] - This move aligns with the US's push for domestic semiconductor manufacturing, although companies like Nvidia still rely on overseas supply chains [9] - The tariffs apply to a limited range of products, with some essential for US technology supply chains exempted [9] Group 9: OpenAI's Power Purchase Agreement - OpenAI plans to purchase up to 750 megawatts of computing power from Cerebras over three years, integrating their chips into OpenAI's solutions [10] - The contract is valued at over US$10 billion, indicating a significant investment in enhancing AI response capabilities [10] - Cerebras, a competitor to Nvidia, aims to diversify its revenue sources through this partnership, which could help it compete more effectively in the market [10] Group 10: ChatGPT's Entry into Translation Market - OpenAI has launched a standalone translation tool, ChatGPT Translate, which is currently free for all users [12] - The tool aims to compete directly with established services like Google Translate, although it currently supports fewer languages and lacks advanced features [12] - The launch appears rushed, with some functionalities still under development, indicating that ChatGPT's translation capabilities are in the early stages [12]
OpenAI,“买”了一堆芯片
半导体行业观察· 2026-01-17 02:57
Core Insights - Nvidia maintains a dominant position in the AI chip market, but competition is intensifying as OpenAI pursues aggressive expansion plans and diversifies its partnerships [1][3] - OpenAI has signed a $10 billion deal with Cerebras for AI chips, part of a broader strategy to secure processing power for its AI technologies [1][8] - OpenAI has committed over $1.4 trillion in infrastructure deals with various chip manufacturers, achieving a private market valuation of $500 billion [1] Nvidia - Nvidia's CEO Jensen Huang highlighted the company's leadership in AI following a strong earnings report, emphasizing that OpenAI's operations rely on Nvidia's platform [1] - In September, Nvidia announced a $100 billion investment to support OpenAI in building and deploying at least 10 gigawatts of Nvidia systems, equivalent to the annual electricity consumption of approximately 8 million U.S. households [3] AMD - OpenAI plans to deploy 6 gigawatts of AMD GPUs over the next few years, with AMD granting OpenAI warrants for up to 160 million shares, representing about 10% of AMD's stock [5] - The first 1 gigawatt chips from this partnership are expected to launch in the second half of 2026 [5] Broadcom - OpenAI and Broadcom announced a collaboration to deploy 10 gigawatts of custom AI accelerators, with the project expected to be completed by the end of 2029 [7] - Broadcom's CEO indicated that revenue from this partnership may not materialize until 2026, highlighting the long-term nature of the agreement [7] Cerebras - OpenAI's recent agreement with Cerebras involves deploying 750 megawatts of AI chips, with the deal valued at over $10 billion [8] - Cerebras claims its chips are 15 times faster than GPU-based systems, which could significantly enhance OpenAI's processing capabilities [8] Potential Partners - OpenAI signed a $38 billion cloud services agreement with Amazon Web Services (AWS), which includes plans for additional infrastructure development [10] - Discussions are ongoing for potential investments from Amazon exceeding $10 billion, with OpenAI considering the use of AWS's AI chips [10] - Google Cloud has also engaged with OpenAI for computing capabilities, although OpenAI has no plans to use Google's Tensor Processing Units [10] Intel - Intel has lagged in the AI chip sector and recently launched a new data center GPU aimed at meeting AI inference workload demands, with samples expected by mid-2026 [12] - The company previously had an opportunity to invest in OpenAI but ultimately decided against it, which may have contributed to its current position in the market [12]
OpenAI has committed billions to recent chip deals. Some big names have been left out
CNBC· 2026-01-16 20:00
Core Insights - OpenAI is aggressively expanding its partnerships with chipmakers to secure processing power for its AI technology, with a recent $10 billion deal with Cerebras marking a significant step in this direction [2][17] - The company has committed over $1.4 trillion to infrastructure deals with major players like Nvidia, AMD, and Broadcom, aiming for a $500 billion private market valuation [3] - Nvidia remains a key partner, having invested $100 billion to support OpenAI's infrastructure, which includes a project to deploy 10 gigawatts of Nvidia systems [5][6] Nvidia - OpenAI has relied on Nvidia's GPUs since its inception, and the partnership has deepened with Nvidia's commitment of $100 billion to support OpenAI's infrastructure [4][5] - The first phase of the Nvidia project is expected to come online in the second half of the year, although there are uncertainties regarding the progression of the agreement [7] - Nvidia's investment will be deployed upon the completion of the first gigawatt of power [8] AMD - OpenAI plans to deploy six gigawatts of AMD's GPUs over multiple years, with AMD issuing a warrant for up to 160 million shares, potentially giving OpenAI a 10% stake in AMD [10] - The first gigawatt of AMD chips is expected to roll out in the second half of 2026, with the deal valued in the billions [11] Broadcom - OpenAI and Broadcom have agreed to deploy 10 gigawatts of custom AI accelerators, with the project expected to be completed by the end of 2029 [14] - Broadcom's CEO has indicated that significant revenue from this partnership is not anticipated in 2026, framing it as a long-term collaboration [15] Cerebras - OpenAI's recent agreement with Cerebras involves deploying 750 megawatts of AI chips, with the deal valued at over $10 billion [16][17] - Cerebras' chips are designed to deliver responses up to 15 times faster than traditional GPU systems, positioning the company for potential public market entry [17] Potential Partners - OpenAI has signed a $38 billion cloud deal with Amazon Web Services, which includes plans for additional infrastructure development [20] - Discussions are ongoing for Amazon to potentially invest over $10 billion in OpenAI, although no official decisions have been made [21] - Google Cloud provides computing capacity to OpenAI, but OpenAI has no plans to utilize Google's in-house chips [22] - Intel, which has lagged in AI chip development, is working on a new data center GPU designed for AI workloads, with customer sampling expected in late 2026 [24]
X @Sam Altman
Sam Altman· 2026-01-16 19:21
Very fast Codex coming!Cerebras (@cerebras):OpenAI🤝Cerebrashttps://t.co/zvVIdIsw2u https://t.co/cKUL5ZSTE3 ...
英伟达GPU VS谷歌TPU:哪些产业链竞争激烈?:传媒
Huafu Securities· 2026-01-16 13:25
Investment Rating - The industry rating is "Outperform the Market" indicating that the overall industry return is expected to exceed the market benchmark index by more than 5% in the next 6 months [15]. Core Insights - The competition between NVIDIA and Google in the AI chip market is heavily reliant on TSMC's CoWoS advanced packaging, which is currently a critical bottleneck in the AI chip supply chain [3]. - TSMC's capital expenditure for 2026 is projected to be between $52 billion and $56 billion, reflecting a year-on-year growth of 27% to 37% due to strong AI demand [3]. - NVIDIA is collaborating with Amkor to expand its production capacity in the U.S. from 2026 to 2029, as TSMC reallocates some advanced packaging orders to OSAT manufacturers [3]. - Samsung and Intel are actively enhancing their advanced process capabilities, with Samsung aiming to increase its global 2nm monthly capacity to 21,000 wafers by the end of 2026 [4]. - HBM is identified as a key battleground in the competition between NVIDIA's GPUs and Google's TPUs, influencing both performance limits and the actual deliverable quantities of chips [4]. - NAND and SSD demand is significantly amplified in AI data centers, with NVIDIA's Rubin platform enhancing data sharing and reuse, potentially increasing SSD demand [5]. - There is a rising demand for inference cards as large model vendors seek alternatives to NVIDIA's chips to reduce dependency and costs [6]. Summary by Sections Advanced Process and Packaging - TSMC leads in advanced packaging with CoWoS capacity constraints impacting NVIDIA and Google's AI chip output [3]. - Amkor and ASE are being utilized to alleviate TSMC's capacity pressure, with Amkor investing $5 billion in advanced packaging facilities in Arizona [3][4]. Storage Side - HBM is crucial for the competition between NVIDIA and Google, while on-chip SRAM is emerging as a new direction for inference storage [4]. - The collaboration between NVIDIA and Groq focuses on inference technology utilizing on-chip SRAM [4]. Client Side - Major AI model vendors are diversifying their computational resources, with Anthropic planning to deploy up to 1 million TPUs by 2026 and OpenAI partnering with Cerebras for a large-scale AI inference platform [6]. Investment Recommendations - The report suggests focusing on sectors within the semiconductor supply chain, including foundries, advanced packaging, storage, and AI model applications, amidst the competitive landscape between NVIDIA and Google [7].
陆家嘴财经早餐2026年1月15日星期四
Sou Hu Cai Jing· 2026-01-16 04:50
Group 1 - The China Securities Regulatory Commission has approved an adjustment to the financing margin ratio for investors, increasing the minimum margin from 80% to 100% for new financing contracts, aimed at reducing leverage and protecting investor rights [1] - The policy for tax refunds on housing transactions for residents has been extended until the end of 2027, allowing taxpayers to receive tax refunds on capital gains from selling their homes if they purchase a new home within one year [1] - A potential IPO boom is anticipated in 2026, with several top global tech companies, including OpenAI and SpaceX, preparing for their public offerings [1] Group 2 - The State Council Information Office will hold a press conference on January 15 to discuss the effectiveness of monetary and financial policies in supporting high-quality economic development [2] - China's foreign trade is projected to reach 45.47 trillion yuan in 2025, marking a 3.8% year-on-year increase, with December exports of rare earths surging by 32% [2] - The People's Bank of China will conduct a 900 billion yuan reverse repurchase operation on January 15, continuing a trend of increasing liquidity in the market [2] Group 3 - A-share market saw a trading volume nearing 4 trillion yuan, with the Shanghai Composite Index closing down 0.31% while the Shenzhen Component Index rose by 0.56% [3] - The Hong Kong Hang Seng Index increased by 0.56%, with significant net buying from southbound funds, particularly in Tencent Holdings [3] - Recent regulatory updates have imposed stricter requirements on fund dividends to ensure compliance and prevent manipulation [3] Group 4 - The insurance fund investment reform pilot has received an additional 40 billion yuan in approved funds, indicating a growing trend in long-term investments [4] - The A-share GEO (Generative Engine Optimization) concept has gained market attention, with several stocks experiencing significant price increases [4] - The Zhejiang Securities Regulatory Bureau has initiated an investigation into Sunflower's restructuring plan for misleading statements [5] Group 5 - The China Association of Automobile Manufacturers reported that both production and sales of automobiles in 2025 are expected to exceed 34 million units, maintaining China's position as the world's largest automotive market [9] - The 2026 work meeting emphasized enhancing the self-sufficiency of the supply chain in the new energy vehicle sector and promoting the application of new energy heavy trucks [9] Group 6 - The sixth batch of high-value medical consumables procurement results is expected to be implemented by May, covering various medical devices [10] - Shanghai has launched an action plan for autonomous driving, aiming to test L3-level vehicles and scale L4-level technology applications [10] Group 7 - Visa has partnered with BVNK to accelerate the adoption of digital assets in daily transactions, integrating stablecoin financing into its payment network [11] - OpenAI has signed a three-year agreement with Cerebras for a significant procurement of computing power, valued at over $10 billion [11] Group 8 - The U.S. Federal Reserve's Beige Book indicates modest to moderate economic growth across most districts, with consumer spending showing slight improvements [13] - The U.S. Treasury Department has issued warnings for citizens to leave Iran amid rising geopolitical tensions [13]
超百亿美元!OpenAI签下AI芯片大单
新华网财经· 2026-01-16 03:34
Cerebras成立于2015年,致力于打造全球最快的人工智能推理与训练平台。目前,公司CS-2和CS-3系统已应用于医学研究、密码学、 能源以及AI智能体等领域。同时,Cerebras也向开发者和企业提供云服务。 据悉,Cerebras系统的独特之处在于,其将海量计算能力、内存和带宽集成到单个巨型芯片上,从而消除了传统硬件上制约推理速度的瓶 颈。在代码及语音聊天任务上,基于Cerebras的大语言模型所给出的响应速度比基于GPU的系统快高达15倍。 当地时间1月14日,OpenAI与美国AI芯片初创公司Cerebras宣布,将部署750兆瓦的Cerebras晶圆级系统。该合作将于2026年起分阶段 落地,并于2028年完成,建成后将成为全球规模最大的高速AI推理平台。据美国消费者新闻与商业频道(CNBC)报道,该项合作的价值 超过100亿美元。 Cerebras联合创始人兼首席执行官安德鲁·费尔德曼(Andrew Feldman)表示,与OpenAI合作,意味着将全球领先的AI模型引入全球最 快的AI处理器。实时推理将彻底变革AI领域,开启构建和交互AI模型的全新方式。 有分析人士认为,此次OpenAI与C ...
格林大华期货早盘提示:全球经济-20260116
Ge Lin Qi Huo· 2026-01-16 01:04
Report Industry Investment Rating - The macro and financial sector of the global economy is rated as "downward" [1] Core Viewpoints - The global political order has entered a dark period of the law of the jungle, causing huge uncertainties to the global economy. The global economy has passed the peak and started to decline [2][4] - The Fed's uncertainty is expected to peak from July to November 2026, and the market may see a trend of "fleeing US assets" [2] - The construction boom of AI data centers in the next five years will require at least $5 trillion [2] Summary by Relevant Catalogs Global Economic Logic - The US has taken actions such as attempting to control Venezuelan oil and purchasing Greenland, which has disrupted the global political order [2] - The US prosecutor has launched a criminal investigation into Fed Chairman Powell, and the Fed has restarted the expansion of its balance - sheet [2] - Goldman Sachs warns that the decline in Las Vegas gambling revenue is similar to the early warning signal before the 2008 financial crisis [2] - The US is adjusting its economic relationship with China and aiming to revive its economic autonomy [2] - The K - shaped differentiation of consumers in the US is intensifying [2] - The Bank of Japan has raised interest rates, and the yield of Japanese 10 - year treasury bonds has risen [2] - Google plans to double its AI computing power every six months and increase it by 1000 times in the next 4 - 5 years [2] - NVIDIA's CEO believes that China will win the AI competition [2] Morning Session Notice - Trump has launched fiscal, monetary, and credit stimulus, but it may lead to future debt crises and market crashes [1] - Citigroup's report indicates that the commodity market is at a turning point, with different price outlooks for various commodities [1] - US retail sales in November exceeded expectations, and the PPI rebounded [1] - The cost gap between building space data centers and ground data centers is narrowing [1] - The demand for global AI chips is constrained by TSMC's production capacity [1] - NVIDIA's new architecture is expected to increase the demand for NAND flash memory [1] - OpenAI has signed a deal with Cerebras worth over $1 billion [1] Other - The US's return to the Monroe Doctrine will have a profound impact on major asset classes [3]