Workflow
Google TPU
icon
Search documents
自研AI芯片,可行吗?
半导体行业观察· 2025-08-26 01:28
公众号记得加星标⭐️,第一时间看推送不会错过。 来源 :内容转自知乎,作者 :Dio-晶,谢谢 。 TPU7,你也想要吗? 还记得上半年Google的TPU7的发布,Ironwood,号称与B200相匹敌。各项指标看上去,也确实称 得上炸裂。 为什么google行,我就不行? 所以到今天,北美&东亚凡是能排得上号OTT,谁要是不自研芯片反而会被人觉得萎了。 想想也是,一个个都是搅动人类社会转速的巨棍了,那面对AI的FOMO,用卡是必须尽早、管饱才能 安心的,而内部算法团队也想要用算法+硬件高效协同来弯道超车DeepSeek甚至某一天形成壁垒的呼 声也越来越强烈,但现实中却又是不得不不定期kiss 黄'ass来排队买GPU,而且还往往不能保证供应 时间和可靠,想象一下都挺受罪的。 但话说回来,真正能说得上做得好的,能用管够,还能真正和Huang扳手腕的,目前只有Google TPU了,而且整整做了七代。你是不是真的也行? 作为做芯片的一根老混子,但经历得多了,反而越发有些保守和迷信 :) 到底应该怎么做?是什么 样的条件决定了成败? 越深想,越觉得,得做颗AI芯片可能没那么难,但其实也没那么简单,难度 搞不好, ...
GB200出货量上修,但NVL72目前尚未大规模训练
傅里叶的猫· 2025-08-20 11:32
Core Viewpoint - The article discusses the performance and cost comparison between NVIDIA's H100 and GB200 NVL72 GPUs, highlighting the potential advantages and challenges of the GB200 NVL72 in AI training environments [30][37]. Group 1: Market Predictions and Performance - After the ODM performance announcement, institutions raised the forecast for GB200/300 rack shipments in 2025 from 30,000 to 34,000, with expected shipments of 11,600 in Q3 and 15,700 in Q4 [3]. - Foxconn anticipates a 300% quarter-over-quarter increase in AI rack shipments, projecting a total of 19,500 units for the year, capturing approximately 57% of the market [3]. - By 2026, even with stable production of NVIDIA chips, downstream assemblers could potentially assemble over 60,000 racks due to an estimated 2 million Blackwell chips carried over [3]. Group 2: Cost Analysis - The total capital expenditure (Capex) for H100 servers is approximately $250,866, while for GB200 NVL72, it is around $3,916,824, making GB200 NVL72 about 1.6 to 1.7 times more expensive per GPU [12][13]. - The operational expenditure (Opex) for GB200 NVL72 is slightly higher than H100, primarily due to higher power consumption (1200W vs. 700W) [14][15]. - The total cost of ownership (TCO) for GB200 NVL72 is about 1.6 times that of H100, necessitating at least a 1.6 times performance advantage for GB200 NVL72 to be attractive for AI training [15][30]. Group 3: Reliability and Software Improvements - As of May 2025, GB200 NVL72 has not yet been widely adopted for large-scale training due to software maturity and reliability issues, with H100 and Google TPU remaining the mainstream options [11]. - The reliability of GB200 NVL72 is a significant concern, with early operators facing numerous XID 149 errors, which complicates diagnostics and maintenance [34][36]. - Software optimizations, particularly in the CUDA stack, are expected to enhance GB200 NVL72's performance significantly, but reliability remains a bottleneck [37]. Group 4: Future Outlook - By July 2025, GB200 NVL72's performance/TCO is projected to reach 1.5 times that of H100, with further improvements expected to make it a more favorable option [30][32]. - The GB200 NVL72's architecture allows for faster operations in certain scenarios, such as MoE (Mixture of Experts) models, which could enhance its competitive edge in the market [33].
花旗:Dell‘Oro Q2 2025 数据中心资本支出报告要点
花旗· 2025-06-23 02:09
Investment Rating - The report indicates a positive outlook for the US Communications Equipment industry, with a significant increase in data center capital expenditures (capex) projected for 2025 [1][8]. Core Insights - The data center market experienced a growth of over 50% year-over-year in the first quarter of 2025, reaching $134 billion, primarily driven by increased server spending, which constitutes more than 50% of data center capex [1][8]. - The top four cloud providers in the US and China are expected to account for approximately 60% of the market, with a projected 39% growth in their capex for fiscal year 2025 [2][8]. - AI training is highlighted as the main focus of data center investments, with expectations for the deployment of over 5 million accelerators in 2025, significantly impacting infrastructure investments [2][9]. - Major companies like Microsoft, Amazon, Google, and Meta are anticipated to expand their general-purpose server units and data center projects, aligning with the growing demand for cloud services and AI capabilities [3][4]. Summary by Sections Market Overview - The enterprise segment saw a 21% year-over-year increase in the first quarter, driven by a server refresh cycle, although potential macroeconomic factors could pose challenges [7]. - The report revises the 2025 growth forecast to 30%, indicating a multi-year capex expansion cycle among the top cloud providers [8]. Company-Specific Developments - Microsoft is on track to deploy its Maia platform in volume later in 2025, contingent on resolving early technical issues [3]. - Amazon, Google, and Meta are expected to significantly increase their server units, with Meta planning to establish data centers in 14 regions over the next 2-4 years [3][4]. - Oracle is projected to grow its capex in double digits in 2025, with plans for new data centers in seven regions [4]. Investment Projections - The report forecasts that the shipment of high-end accelerators will reach 5 million in 2025, translating to an accelerated server capex of $205 billion, which represents 34% of total data center capex [9].
为什么定义2000 TOPS + VLA + VLM为L3 级算力?
自动驾驶之心· 2025-06-20 14:06
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on Xiaopeng Motors' recent paper presented at CVPR 2025, which validates the scaling laws in the context of autonomous driving and introduces new standards for computing power in Level 3 (L3) autonomous vehicles [4][6][22]. Group 1: Scaling Laws and Model Performance - Xiaopeng Motors' paper systematically verifies the effectiveness of scaling laws in autonomous driving, indicating that larger model parameters lead to improved performance [4][6]. - The research establishes a clear power-law relationship between model performance, parameter scale, data scale, and computational power, originally proposed by OpenAI [4][6]. Group 2: Computing Power Standards - The paper introduces a new computing power standard of 2000 TOPS for L3 autonomous driving, highlighting the exponential increase in computational requirements as the driving level advances [8][20]. - For L2 systems, the required computing power ranges from 80 to 300 TOPS, while L3 systems necessitate thousands of TOPS due to the complexity of urban driving scenarios [8][20]. Group 3: VLA and VLM Model Architecture - Xiaopeng's VLA (Vision-Language-Action) model architecture integrates visual understanding, reasoning, and action generation capabilities, requiring substantial computational resources [10][12]. - The architecture's visual processing module alone demands hundreds of TOPS for real-time data fusion from multiple sensors [10][12]. Group 4: Comparison of Onboard and Data Center Computing Power - The article differentiates between onboard computing power, which focuses on real-time data processing for driving decisions, and data center computing power, which is used for offline training and model optimization [12][15]. - Onboard systems must balance real-time performance and power consumption, while data centers can leverage significantly higher computational capabilities for complex model training [12][15]. Group 5: Market Dynamics and Competitive Landscape - The market for AI chips in autonomous driving is dominated by a few key players, with NVIDIA holding a 36% market share, followed by Tesla and Huawei [20]. - The competitive landscape has shifted significantly since 2020, impacting the development of AI chips and their applications in autonomous driving [17][20].
摩根士丹利:全球科技-AI 供应链ASIC动态 -Trainium 与 TPU
摩根· 2025-06-19 09:46
Investment Rating - The report maintains an "Overweight" (OW) rating on several companies in the AI ASIC supply chain, including Accton, Wiwynn, Bizlink, and King Slide in downstream systems, as well as TSMC, Broadcom, Alchip, MediaTek, Advantest, KYEC, Aspeed, and ASE in upstream semiconductors [1][11]. Core Insights - The AI ASIC market is expected to grow significantly, with NVIDIA outpacing the ASIC market in 2025, generating enthusiasm for ASIC vendors. Asian design service providers like Alchip and MediaTek are anticipated to gain market share due to their efficient operations and quality services [2][21]. - The global semiconductor market is projected to reach $1 trillion by 2030, with AI semiconductors being a major growth driver, estimated to reach $480 billion, comprising $340 billion from cloud AI semiconductors and $120 billion from edge AI semiconductors [21][22]. Summary by Sections AI ASIC Market Developments - AWS Trainium: Alchip has taped out the Trainium3 design, with wafers already produced. Alchip is expected to have a strong chance of winning the 2nm Trainium4 project [3][15]. - Google TPU: Broadcom is expected to tape out a new 3nm TPU after the Ironwood (TPU v7p) enters mass production in 1H25, while MediaTek is also preparing for a 3nm TPU tape-out [4][18]. - Meta MTIA: Preliminary volume forecasts for MTIAv3 are expected in July, with considerations for larger packaging for MTIAv4 [5]. Downstream and Upstream Suppliers - Downstream suppliers for AWS Trainium2 include Gold Circuit for PCB boards, King Slide for rail kits, and Bizlink for active electrical cables. Wiwynn is expected to see 30-35% of its total revenue from Trainium2 servers in 2025 [6][11]. - Key upstream suppliers include TSMC for foundry services, Broadcom for IP and design services, and Alchip for back-end design services [11][10]. Market Size and Growth Projections - The AI semiconductor market is projected to grow to $50 billion by 2030, representing 15% of cloud AI semiconductors. This indicates a significant opportunity for AI ASIC vendors despite NVIDIA's dominance in the AI GPU market [21][24]. - The report estimates that the global AI capex total addressable market (TAM) for 2025 could reach around $199 billion, driven by major cloud service providers [26][58]. Financial Implications - Alchip's revenue from Trainium3 chips is estimated to be $1.5 billion in 2026, with expectations of continued growth in the AI ASIC market [18][21]. - MediaTek's revenue from TPU projects is projected to grow significantly, with estimates of $1 billion in 2026 and potential growth to $2-3 billion in 2027 [19][21].
腾讯研究院AI速递 20250519
腾讯研究院· 2025-05-18 14:33
Group 1: OpenAI and AI Programming Tools - OpenAI launched a new AI programming tool Codex, powered by the codex-1 model, which generates clearer code and automatically iterates testing until successful [1] - Codex operates in a cloud sandbox environment, capable of handling multiple programming tasks simultaneously, and supports integration with GitHub for preloading code repositories [1] - The tool is currently available to paid users of ChatGPT Pro, with plans for rate limiting and options to purchase additional credits for more usage [1] Group 2: Image Generation Technologies - Tencent's Mix Yuan Image 2.0 achieves millisecond-level image generation, allowing users to see real-time changes as they input prompts, breaking the traditional 5-10 second generation time limit [2] - The new model supports both text-to-image and image-to-image functionalities, with adjustable reference strength for the image generation process [2] - Manus introduced an image generation feature that understands user intent and plans solutions, providing a one-stop service from brand design to website deployment, although complex tasks may take several minutes to complete [3] Group 3: Google and LightLab Project - Google launched the LightLab project, enabling precise control over light and shadow in images through diffusion models, allowing adjustments to light intensity and color [4][5] - The research team built a training dataset by combining real photo pairs with synthetic rendered images, achieving superior PSNR and SSIM metrics compared to existing methods [5] Group 4: Supermemory API - Supermemory released the Infinite Chat API, acting as a transparent proxy between applications and LLMs, maintaining dialogue context to overcome the 20,000 token limit of large models [6] - The API utilizes RAG technology to manage overflow context, claiming to save 90% of token consumption, and can be integrated into existing applications with just one line of code [6] - Pricing includes a fixed monthly fee of $20, with the first 20,000 tokens of each conversation free, and $1 per million tokens for any excess [6] Group 5: Grok AI Controversy - Grok AI assistant faced backlash for inserting controversial content related to "white genocide" in responses, attributed to unauthorized modifications of system prompts by an employee [7] - xAI publicly released Grok's prompts on GitHub and committed to enhancing review mechanisms and forming a monitoring team [7] - The incident highlighted security vulnerabilities in AI systems that heavily rely on prompts, with research indicating that mainstream models can be compromised through specific prompting techniques [7] Group 6: Windsurf and SWE-1 Model - Windsurf launched the SWE-1 model, focusing on optimizing the entire software engineering process rather than just coding functions, marking its first product release after being acquired by OpenAI for $3 billion [8] - SWE-1 performs comparably to models like GPT-4.1 in programming benchmarks but lags behind Claude 3.7 Sonnet, with a commitment to lower service costs than Claude 3.5 Sonnet [8] Group 7: Google TPU vs. OpenAI GPU - Google TPU offers AI cost efficiency at one-fifth the price of OpenAI's NVIDIA GPUs while maintaining comparable performance [10] - Google's API service Gemini 2.5 Pro is priced 4-8 times lower than OpenAI's o3 model, reflecting different market strategies [10] - Apple's decision to use Google TPU for training its AFM model may influence other companies to explore alternatives to NVIDIA GPUs [10] Group 8: Lovart's Design Philosophy - Lovart's founder emphasizes a three-stage evolution of AI image products, from single content generation to workflow tools, and now to AI-driven agents [11] - The design philosophy focuses on restoring the original essence of design, facilitating natural interaction between AI and users [11] - Lovart believes that general product managers will be replaced by designers with specialized knowledge, stating, "we have no product managers, only designers" [11] Group 9: Lilian Weng's Insights on Model Thinking - Lilian Weng discusses the importance of "thinking time" in large models, suggesting that increasing computational time during testing can enhance performance on complex tasks [12] - Current model thinking strategies include parallel sampling and sequential revision, requiring a balance between thinking time and computational costs [12] - Research indicates that optimizing thinking chains through reinforcement learning may lead to reward hacking issues, necessitating further investigation [12]