算力成本控制
Search documents
OpenAI 的命门,决定了大模型公司的未来
3 6 Ke· 2025-09-03 07:12
Core Insights - The article emphasizes that "cost control of computing power" is fundamental for the development and commercialization of large models, with the Scaling Law being a key metric for enhancing model capabilities [1][19]. - OpenAI's introduction of the "routing" feature with GPT-5 aims to match user queries with appropriate models to improve user experience and computational efficiency, despite facing criticism for not meeting user expectations [1][3][4]. Group 1: Cost Control and Model Efficiency - DeepSeek has significantly reduced the inference and training costs of models to below 10%, contributing to its popularity in the open-source community [1]. - The MoE architecture has gained traction post-GPT-4, becoming the default choice for many large model developers due to its effectiveness in lowering inference costs [1]. - OpenAI's routing feature is designed to identify simpler queries that can be handled by less resource-intensive models, potentially reducing computational costs by 8% if 10% of queries are matched correctly [10][23]. Group 2: Challenges and User Experience - OpenAI's push for the routing feature was driven by the need to help users select the most suitable model from over five options, especially for those unfamiliar with large models [6][8]. - The routing function's failure to align user expectations with model capabilities has been a significant factor in the criticism of GPT-5 [3][4]. - The efficiency of routing is crucial, as the computational cost difference between inference and non-inference models can be as high as 5-6 times, with complex queries consuming thousands of tokens [8][10]. Group 3: Infrastructure and Market Expansion - OpenAI is expanding its infrastructure with a plan to add 4.5 GW of data center capacity by July 2025, in collaboration with Oracle [19]. - The company is also exploring partnerships in India to establish a data center with at least 1 GW capacity, aiming to connect local user growth with computational resources [20]. - The "AI cost paradox" is driving demand for efficient routing functions, as the total computational demand continues to rise despite lower token prices [19][23].
OpenAI的命门,决定了大模型公司的未来
Hu Xiu· 2025-09-03 06:26
Core Insights - The article emphasizes that "computational cost control" is fundamental for the development and commercialization of large models, with DeepSeek's recent advancements significantly reducing inference and training costs to below 10% [1] - OpenAI's introduction of the "routing" feature with GPT-5 aims to enhance user experience by matching simple queries to low-consumption models and complex queries to high-capacity models, although it has faced criticism for not meeting user expectations [1][3][5] Group 1: Model Development and Performance - DeepSeek's MoE architecture is becoming the default choice among large model developers due to its effectiveness in reducing inference costs [1] - OpenAI's GPT-5, despite claims of improved performance, has been criticized for failing to resolve simple queries effectively, leading to user dissatisfaction [3][5] - The routing function's failure to align user expectations with model capabilities has been identified as a direct cause of the issues faced during GPT-5's launch [5][6] Group 2: Computational Efficiency and Cost - The routing feature is essential for OpenAI to manage the increasing number of models and assist users in selecting the appropriate model for their tasks [8][10] - Research indicates that the computational cost difference between inference and non-inference models can be as high as 5 to 6 times, with complex queries consuming significantly more tokens [11] - OpenAI's routing function could potentially reduce computational costs by 8% if it can identify 10% of queries suitable for non-inference models [15] Group 3: Industry Trends and Future Outlook - The "AI cost paradox" is emerging, where the decrease in token prices does not lead to a reduction in overall costs due to the increasing complexity and volume of tasks that models can handle [25][29] - OpenAI is expanding its infrastructure with a plan to add 4.5 GW of data center capacity by July 2025, indicating a strong demand for computational resources [26] - The pursuit of efficient "computational-to-intelligence" conversion is crucial for large model companies to maintain competitive advantages in system efficiency and user experience [29]