谷歌最新 Gemini Agent 爆击GPT-5.2?人类最后考试得分见分晓,网友:Altman又该发“红色警报”了
3 6 Ke·2025-12-12 10:02

Core Insights - Google has launched a new version of its Gemini Deep Research tool, which integrates a research agent API for embedded research capabilities [1][4] - OpenAI has simultaneously released the highly anticipated GPT-5.2, indicating an intense competition between the two AI giants over agent capabilities and application ecosystems [2] Group 1: Google Gemini Deep Research Agent - The new Gemini Deep Research tool is designed to integrate vast amounts of information and handle extensive contextual data, with applications ranging from due diligence to drug toxicity safety research [4] - The Deep Research Agent is built on the Gemini 3 Pro model, which is touted as Google's most reliable and suitable model for long-chain reasoning tasks [5] - Key enhancements in the Deep Research Agent include model upgrades, breakthroughs in reasoning stability, and comprehensive improvements in interaction capabilities [4][5] Group 2: Model Upgrades and Capabilities - The Deep Research Agent utilizes multi-step reinforcement learning to maintain stable reasoning paths over complex research tasks, significantly reducing the probability of errors [5][6] - It can now handle extensive context processing, including academic papers and official reports, and provides traceable citations for every conclusion, ensuring output credibility [6][13] - The new agent has achieved advanced results in benchmark tests, outperforming GPT-5 Pro in the "Human Last Exam" (HLE) with a score of 46.4% compared to GPT-5 Pro's 38.9% [10][13] Group 3: Benchmark Testing and Open Source - Google has introduced a new benchmark test called DeepSearchQA, which evaluates agents on complex multi-step information retrieval tasks, and has made it open source [7][8] - The DeepSearchQA includes 900 carefully designed tasks across 17 domains, focusing on the comprehensiveness of answers generated by the agents [8][10] Group 4: Industry Reactions and Comparisons - The technical community has responded positively to Google's emphasis on verifiable citations and multi-step reasoning stability, marking a significant advancement in AI agent development [15][16] - Comparisons between Google's Deep Research Agent and OpenAI's GPT-5.2 have emerged, with some users noting that while the two serve different purposes, GPT-5.2 is perceived as superior [18][19] - The competition between Google and OpenAI is characterized as a "release war," with both companies striving to dominate the AI agent landscape [19][22] Group 5: Future Implications - The competition is not just about model capabilities but also about who can establish the standard for agent frameworks, which will be central to future software development [22]