豆包 1.8 多模态超越谷歌Gemini 3！字节祭出“推理代工”，要做模型届的英特尔？

Core Insights - The article discusses the launch of Doubao Model 1.8 by Huoshan Engine, which is optimized for multi-modal agent scenarios, featuring a context window of 256k and various token limits for input and output [2][3]. Model Performance - Doubao 1.8 achieves a processing speed of 5000k tokens per minute (TPM) and 30k requests per minute (RPM), leading to significant improvements in various benchmarks, surpassing competitors like Gemini 3 [3][4]. - In specific benchmarks, Doubao 1.8 scored 94.6 in AIME-25 for mathematics and 85.7 in GPQA-Diamond for reasoning, indicating its strong performance across multiple tasks [4]. Multi-modal Capabilities - The model has enhanced multi-modal understanding, excelling in visual judgment, spatial understanding, document parsing, and video motion recognition, positioning it among the global leaders in these areas [3][7]. - Doubao 1.8 can efficiently process long videos, quickly identifying critical moments, which has applications in various sectors such as online education and safety inspections [5][7]. Business Applications - The model's capabilities allow for complex agent construction, which can create significant value across various industries, with a reported daily token usage exceeding 50 trillion, marking a 417-fold increase since its launch [6][16]. - Huoshan Engine introduced the "Doubao Assistant API," enabling businesses to utilize core agent capabilities easily, with plans to expand functionalities [16][17]. Cost Efficiency Initiatives - The "AI Savings Plan" offers unified pricing for enterprises using large models, allowing for cost savings of up to 47% based on usage [17]. - The "Inference Outsourcing" service allows businesses to upload encrypted model parameters without managing GPU infrastructure, potentially halving hardware and operational costs [18][19]. Creative Tools - The article highlights advancements in Doubao's image and video generation capabilities, including the new Seedream and Seedance models, which enhance creative processes in various applications [8][9]. - Seedance 1.5 Pro introduces features like synchronized audio-visual output and multi-language support, significantly improving content creation efficiency [9][13].