多智能体模型
Search documents
谷歌深夜放出 IMO 金牌模型,多项测试力压 Grok 4、OpenAI o3!网友评论两极分化
AI前线· 2025-08-04 06:43
Core Viewpoint - Google has launched the Gemini 2.5 Deep Think model, which won a gold medal at the International Mathematical Olympiad (IMO), showcasing its advanced AI reasoning capabilities [2][3][4]. Group 1: Model Features and Capabilities - Gemini 2.5 Deep Think is Google's first publicly available multi-agent model, designed to generate multiple AI agents to tackle a problem simultaneously, leading to better answers despite higher computational costs [5][6]. - The model can reason in a matter of hours, unlike most consumer AI models that operate in seconds or minutes, aiming to enhance research and gather feedback for academic use [6]. - Deep Think employs parallel thinking techniques, allowing it to explore various angles and refine answers over time, similar to human problem-solving processes [8][9]. Group 2: Performance Metrics - In benchmark tests, Gemini 2.5 Deep Think achieved a score of 34.8% on the Humanity's Last Exam (HLE), outperforming xAI's Grok 4 at 25.4% and OpenAI's o3 at 20.3% [18]. - The model scored 87.6% on LiveCodeBench V6, surpassing competitors like Grok 4 (79%) and OpenAI's o3 (72%) [18]. Group 3: User Reactions and Market Position - The launch of Gemini 2.5 Deep Think has sparked significant discussion on social media and tech forums, with mixed reviews regarding its performance and pricing [19][22]. - Some users expressed enthusiasm for the model's capabilities and considered subscribing to the Ultra plan, while others criticized its performance relative to competitors and questioned its value at $250 per month [26][27].