推理成本打到1元/每百万token,浪潮信息撬动Agent规模化的“最后一公里”
LCXXLCXX(SZ:000977) 量子位·2025-12-26 04:24

Core Viewpoint - The global AI industry has transitioned from a model performance competition to a "life-and-death race" for the large-scale implementation of intelligent agents, where cost reduction is no longer optional but a critical factor for profitability and industry breakthroughs [1] Group 1: Cost Reduction Breakthrough - Inspur Information has launched the Yuan Brain HC1000 ultra-scalable AI server, achieving a breakthrough in inference cost to 1 yuan per million tokens for the first time [2][3] - This breakthrough is expected to eliminate the cost barriers for the industrialization of intelligent agents and reshape the underlying logic of competition in the AI industry [3] Group 2: Future Cost Dynamics - Liu Jun, Chief AI Strategist at Inspur, emphasized that the current cost of 1 yuan per million tokens is only a temporary victory, as the future will see an exponential increase in token consumption and demand for complex tasks, making current cost levels insufficient for widespread AI deployment [4][5] - For AI to become a fundamental resource like water and electricity, token costs must achieve a significant reduction, evolving from a "core competitiveness" to a "ticket for survival" in the intelligent agent era [5] Group 3: Historical Context and Current Trends - The current AI era is at a critical point similar to the history of the internet, where significant reductions in communication costs have driven the emergence of new application ecosystems [7] - As technology advances and token prices decrease, companies can apply AI on more complex and energy-intensive tasks, leading to an exponential increase in token demand [8] Group 4: Token Consumption Data - Data from various sources indicates a significant increase in token consumption, with ByteDance's Doubao model reaching a daily token usage of over 50 trillion, a tenfold increase from the previous year [13] - Google's platforms are processing 1.3 trillion tokens monthly, equivalent to a daily average of 43.3 trillion, up from 9.7 trillion a year ago [13] Group 5: Cost Structure Challenges - Over 80% of current token costs stem from computing expenses, with the core issue being the mismatch between inference and training loads, leading to inefficient resource utilization [12] - The architecture must be fundamentally restructured to enhance the output efficiency of unit computing power, addressing issues such as low utilization rates during inference and the "storage wall" bottleneck [14][16] Group 6: Innovations in Architecture - The Yuan Brain HC1000 employs a new DirectCom architecture that allows for efficient aggregation of massive local AI chips, achieving a breakthrough in inference cost [23] - This architecture supports ultra-large-scale lossless expansion and enhances inference performance by 1.75 times, with single card utilization efficiency (MFU) potentially increasing by 5.7 times [27] Group 7: Future Directions - Liu Jun stated that achieving a sustainable and significant reduction in token costs requires a fundamental innovation in computing architecture, shifting the focus from scale to efficiency [29] - The AI industry must innovate product technologies, develop dedicated computing architectures for AI, and explore specialized computing chips to optimize both software and hardware [29]