Workflow
大规模智能应用
icon
Search documents
谷歌Gemini最强性价比模型发布,1块8读完3本三体
量子位· 2026-03-04 11:30
Core Viewpoint - Google has officially launched Gemini 3.1 Flash-Lite, which is positioned as the most cost-effective model in the Gemini 3 series, emphasizing lightweight and fast performance [1][3][9]. Pricing and Performance - The cost of Gemini 3.1 Flash-Lite is notably low, with input tokens priced at $0.25 per million and output tokens at $1.50 per million, allowing for significant savings in AI applications [5][10]. - For example, it costs approximately 1.8 RMB to process the entire "Three-Body" trilogy [6]. - The model boasts a response time that is 2.5 times faster and an output speed that is 45% higher compared to its predecessor, Gemini 2.5 Flash [7][10]. Target Applications - Designed for large-scale intelligent applications, Gemini 3.1 Flash-Lite enables low-cost and efficient batch deployment of models [8][26]. - It supports adjustable thinking levels, allowing developers to choose the model's depth of thought based on task complexity, which is crucial for handling high-frequency requests [23][24]. Benchmarking and Comparisons - In benchmark tests, Gemini 3.1 Flash-Lite achieved a score of 1432 in Arena evaluations, performing well in creative writing and long queries, and leading in the low-cost model segment [18]. - It outperformed previous larger Gemini models in various benchmarks, scoring 86.9% in GPQA Diamond and 76.8% in MMMU Pro [21]. - Compared to other lightweight models like GPT-5 mini and Claude 4.5 Haiku, Gemini 3.1 Flash-Lite shows significant advantages in both speed and cost [16]. Competitive Landscape - Following the launch of Gemini 3.1 Flash-Lite, OpenAI quickly released GPT-5.3 Instant, which focuses on user interaction experience and provides more contextually relevant responses [27][29]. - A comparison showed that while Gemini 3.1 Flash-Lite offers straightforward outputs, GPT-5.3 Instant provides more complete and engineering-oriented solutions [31][32]. Conclusion - Gemini 3.1 Flash-Lite stands out for its high performance and cost efficiency, making it a competitive option for enterprises and developers needing real-time responses and large-scale processing capabilities [26][41].