算力成本

Search documents
26天倒计时:OpenAI即将关停GPT-4.5Preview API
3 6 Ke· 2025-06-18 07:34
Core Insights - OpenAI announced the removal of the GPT-4.5 Preview API effective July 14, which will impact developers who have integrated it into their products [2][3] - The removal was planned since the release of GPT-4.1 in April, and GPT-4.5 was always considered an experimental product [5] - OpenAI is focusing on promoting more scalable and cost-effective models, as evidenced by the recent 80% price reduction of the o3 API [8] Pricing and Cost Considerations - The pricing for GPT-4.5 API was significantly high at $75 per million input tokens and $150 per million output tokens, making it commercially unviable [6] - The cost of NVIDIA H100 GPUs, approximately $25,000, and their high power consumption further complicate the financial feasibility of maintaining such models [6] Strategic Implications - The rapid exit of GPT-4.5 highlights the challenges of model iteration speed and external computing costs as critical factors for OpenAI's business model [11] - OpenAI's strategy appears to be consolidating resources towards models that offer better scalability and cost control, while discontinuing less successful or ambiguous products [8]
对话红帽全球副总裁曹衡康:AI成本下降了 芯片的量一定会起来
Mei Ri Jing Ji Xin Wen· 2025-06-14 09:02
Core Viewpoint - The consensus in the industry is that the cost of computing power will eventually decrease, but there is no unified path chosen among data centers, integrated machines, or inference servers [1] Group 1: AI Inference Year - 2023 is considered the year of AI inference, marking the official launch of AI applications that will generate business revenue and internal cost control for enterprises [1] - Red Hat has chosen to adopt the "vLLM" framework, a high-performance large language model inference framework that has become a de facto standard in the open-source community [1] Group 2: Contribution and Market Potential - Contributors from China account for 35% of the contributions to the vLLM community, indicating a strong potential for inference technology to bring enterprise value in China [1] - The company identifies two technical challenges in inference: achieving high-performance inference with minimal hardware and cost, and distributing inference workloads across multiple servers [1] Group 3: Future of Computing Power Costs - Red Hat plans to launch inference servers in 2025, emphasizing that the main advantage is the reduction of computing power costs for enterprises [2] - The company does not produce hardware but focuses on software solutions, aiming to lower the barriers for AI adoption among businesses [2] - As computing costs decrease, the demand for GPU cards is expected to rise significantly, potentially increasing the number of enterprises using AI from 1,000 to 100,000 or even 1 million [2]