Inference Cost - filings, earnings calls, financial reports, news

Inference Cost

Search documents

Hua Er Jie Jian Wen· 2026-02-25 15:47

Core Insights - The profitability paths of leading AI companies OpenAI and Anthropic are under severe pressure due to unexpected increases in inference costs, raising doubts about their ability to achieve over 60% gross margin by the end of the decade [1] Group 1: Financial Performance - OpenAI's gross margin fell from 40% to 33% last year, significantly below its forecast of 46% [1] - Anthropic's gross margin improved from a projected negative 94% in 2024 to an expected 40% in 2025, still 10 percentage points lower than its previous target [1] Group 2: Inference Costs - Inference costs, which are payments to cloud service providers for driving AI model responses, are the main factor pressuring the gross margins of both companies [2] - OpenAI's inference costs surged nearly fourfold year-over-year to $8.4 billion, exceeding its earlier prediction of $6.6 billion [2] - Anthropic's inference costs are projected to triple to $2.7 billion by 2025, also higher than previous forecasts [2] Group 3: User Base and Product Structure - OpenAI's large non-paying user base, with approximately 910 million weekly active users and only about 5% being paid users, contributes to its gross margin pressure [3] - Nearly half of OpenAI's total inference costs (around $3.9 billion) supported non-paying users, while costs for paying users amounted to $4.5 billion [3] - The launch of the video generation tool Sora has led to higher server resource consumption compared to text-based queries, further straining profitability [3] Group 4: Long-term Goals and Efficiency - Despite the overall pressure on gross margins, OpenAI has improved efficiency in serving paying users, with the profit margin for paid user services rising to about 70% in October last year, up from 52% at the end of the previous year [4] - OpenAI plans to enhance monetization of non-paying users through advertising, e-commerce, and subscription services, having launched an ad-supported ChatGPT subscription service priced between $5 to $8 per month [4] - The company anticipates that 66% of its $14.1 billion inference costs this year will be allocated to serving paying users, aiming for this to rise to 94% by 2030, with a gross margin target of approximately 67% [4]

Artificial Intelligence

Gross Margin

Inference Cost

Artificial Intelligence

ChatGPT

Sora

Artificial Intelligence

Gross Margin

Inference Cost

Artificial Intelligence

ChatGPT

Sora

GPU跟ASIC的训练和推理成本对比

傅里叶的猫· 2025-07-10 15:10

Core Insights - The article discusses the advancements in AI GPU and ASIC technologies, highlighting the performance improvements and cost differences associated with training large models like Llama-3 [1][5][10]. Group 1: Chip Development and Performance - NVIDIA is leading the development of AI GPUs with multiple upcoming models, including the H100, B200, and GB200, which show increasing memory capacity and performance [2]. - AMD and Intel are also developing competitive AI GPUs and ASICs, with notable models like MI300X and Gaudi 3, respectively [2]. - The performance of AI chips is improving, with higher configurations and better power efficiency being observed across different generations [2][7]. Group 2: Cost Analysis of Training Models - The total cost for training the Llama-3 400B model varies significantly between GPU and ASIC, with GPUs being the most expensive option [5][7]. - The hardware cost for training with NVIDIA GPUs is notably high, while ASICs like TPU v7 have lower costs due to advancements in technology and reduced power consumption [7][10]. - The article provides a detailed breakdown of costs, including hardware investment, power consumption, and total cost of ownership (TCO) for different chip types [12]. Group 3: Power Consumption and Efficiency - AI ASICs demonstrate a significant advantage in inference costs, being approximately ten times cheaper than high-end GPUs like the GB200 [10][11]. - The power consumption metrics indicate that while GPUs have high thermal design power (TDP), ASICs are more efficient, leading to lower operational costs [12]. - The performance per watt for various chips shows that ASICs generally outperform GPUs in terms of energy efficiency [12]. Group 4: Market Trends and Future Outlook - The article notes the increasing availability of new models like B300 in the market, indicating a growing demand for advanced AI chips [13]. - Continuous updates on industry information and investment data are being shared in dedicated platforms, reflecting the dynamic nature of the AI chip market [15].