谷歌最新 Gemini Agent 爆击GPT-5.2？人类最后考试得分见分晓！网友：Altman又该发“红色警报”了

Core Insights - The article discusses the intense competition between Google and OpenAI in the AI sector, particularly focusing on the simultaneous release of Google's Gemini Deep Research and OpenAI's GPT-5.2, highlighting the strategic timing of these updates [2][3]. Group 1: Google's Gemini Deep Research - Google has launched the new Gemini Deep Research tool, an intelligent agent capable of integrating vast amounts of information and handling complex contextual data for various tasks, including due diligence and drug toxicity research [5]. - The Deep Research Agent is built on the Gemini 3 Pro model, which is considered Google's most reliable and suitable model for long-chain reasoning, emphasizing a significant qualitative leap in the agent's reliability [6][7]. - The new agent features enhanced capabilities in model upgrades, reasoning stability, and interaction, allowing it to handle complex research tasks that traditional LLMs could not manage [6][7]. Group 2: Performance Metrics - The Deep Research Agent achieved a score of 46.4% in the "Human Last Exam" (HLE), outperforming OpenAI's GPT-5.2, which scored 45% [13][20]. - In the DeepSearchQA benchmark, the agent scored 66.1%, slightly ahead of GPT-5.2's 65.2%, indicating its superior performance in complex multi-step information retrieval tasks [13][20]. - The agent's ability to maintain decision consistency over long tasks and provide traceable citations for every conclusion marks a significant advancement in AI research capabilities [28]. Group 3: Competitive Landscape - The competition between Google and OpenAI is characterized by rapid releases and strategic positioning, with both companies focusing on enhancing their foundational models and agent capabilities [21][22]. - Google's Gemini 3 Pro emphasizes retrieval enhancement and large-scale context processing, while OpenAI's GPT-5.2 focuses on logical consistency and tool invocation stability, leading to a close competition where differences are often task-specific [22][23]. - The introduction of the Interactions API by Google allows developers to control the agent's behavior and task execution more effectively, marking a shift towards a more structured approach in AI agent development [15][25].