Gemini Deep Think
Search documents
腾讯研究院AI速递 20260309
腾讯研究院· 2026-03-08 16:01
Group 1: Generative AI Developments - OpenAI released the GPT-5.4 series, integrating Computer Use capabilities, combining code, reasoning, and desktop control into a unified model [1] - The OSWorld desktop control evaluation scored 75.0%, surpassing the human benchmark of 72.4%, while GDPval professional work evaluation reached 83.0% [1] - Standard API pricing is set at $2.50 per million inputs and $15 per million outputs, with a Pro version priced at a 12x premium targeting complex agent scenarios [1] Group 2: OpenAI Initiatives - Peter Steinberger, founder of OpenClaw, joined OpenAI and launched the "Codex for Open Source" project, offering free API credits and 6 months of ChatGPT Pro access to open-source maintainers [2] - The application criteria target core maintainers and widely used public project operators, with non-standard projects eligible if they play a significant role in the ecosystem [2] - Steinberger claims to balance responsibilities between OpenAI and OpenClaw, aiming to support as many open-source contributors as possible [2] Group 3: Tencent Innovations - Tencent's Mix Yuan introduced the HY-WU paradigm, generating personalized LoRA parameters in real-time during inference, replacing traditional static fine-tuning methods [3] - This approach was applied to an 800 billion parameter image editing model, outperforming closed-source models in multiple metrics, with only a 0.11 point gap from GPT Image 1.5 [3] - The paradigm is designed for cross-modal applicability, with plans to expand functional memory to video generation, multi-modal alignment, and edge deployment [3] Group 4: Xiaomi's AI Agent - Xiaomi launched the miclaw mobile AI Agent product based on the MiMo model, encapsulating over 50 system-level tools for autonomous task orchestration [4] - The AI can interact with the entire home IoT ecosystem and supports third-party applications through an SDK [4] - It features self-evolution capabilities, allowing it to create sub-agents and continuously adapt based on user preferences and experiences [4] Group 5: Karpathy's Autoresearch - Karpathy released the autoresearch project, consisting of only 630 lines of code, enabling an AI agent to autonomously execute code editing, model training, evaluation, and iteration without human intervention [5] - Each training session lasts 5 minutes, using val_bpb as a unified evaluation metric, with the agent submitting improvements via Git [6] - Karpathy is running an enhanced version on eight H100 GPUs, positioning the project as a proof of concept for self-evolving LLMs, with potential for expansion into various research fields [6] Group 6: Security Innovations - Illia Polosukhin, co-author of the Transformer paper, rewrote OpenClaw in Rust, launching the secure version IronClaw with a four-layer defense architecture [7] - Key security features include WASM sandbox isolation, AES-256-GCM encrypted credential vaults, and a trusted execution environment (TEE) [7] - The project aligns with NEAR Protocol's "user-owned AI" strategy, establishing an AI cloud platform and a marketplace for intelligent agents [7] Group 7: Multiplayer Video World Model - The team led by Xie Sainin introduced Solaris, the first multiplayer video world model capable of generating consistent first-person perspectives among multiple players, validated in Minecraft [8] - They developed the SolarisEngine for data collection, creating a dataset of 12.64 million frames, the first annotated dataset for training multiplayer world models [8] - The model incorporates a multi-player self-attention layer to facilitate information exchange among players, significantly outperforming previous solutions [8] Group 8: AI in Theoretical Physics - Google Research utilized Gemini Deep Think, tree search, and automatic numerical feedback to solve the unresolved problem of cosmic string gravitational radiation power spectrum [9] - The AI explored approximately 600 candidate paths, with 80% pruned by an automatic verifier, ultimately identifying six solutions, with the Gergenbauer method being the most elegant [9] - The final closed-form solution was achieved through human-AI collaboration, showcasing a reusable AI-driven research paradigm [9] Group 9: Labor Market Impact of AI - Anthropic's labor market report indicates that AI is subtly impacting young people's first jobs, with a 14% decrease in the proportion of 22-25-year-olds entering high AI exposure occupations [10] - The AI task coverage for computer programmers reached 74.5%, but actual coverage across industries is only about one-third of theoretical values, indicating significant untapped potential [10] - Companies are shifting investments from "future human assets" to "immediate computational assets," leading to the disappearance of entry-level positions and emphasizing decision-making, aesthetic engineering, and AI collaboration skills as core competencies [11] Group 10: OpenClaw's Global Impact - OpenClaw's global popularity surged, with over 1,300 attendees at a New York gathering, where Huang Renxun described it as "the most important software release in history" [12] - Observations from the event indicated users spending an average of $1,000 to $2,000 monthly on model costs, with some burning 1 billion tokens daily [12] - Security concerns emerged as the primary issue, with no one believing the system is 100% secure, highlighting the genuine demand for personal intelligent agents and marking the onset of the consumer AI agent era [12]
X @Demis Hassabis
Demis Hassabis· 2026-02-11 23:54
Gemini Deep Think is helping advance scientific progress! Very cool to see how experts are using it to accelerate solutions to longstanding problems in maths, physics, and computer scienceGoogle DeepMind (@GoogleDeepMind):How could AI act as a better research collaborator? 🧑🔬In two new papers with @GoogleResearch, we show how Gemini Deep Think uses agentic workflows to help solve research-level problems in mathematics, physics, and computer science.More → https://t.co/ISWedYjpgD https://t.co/LWDIwLzXNw ...
不只是“做题家”!DeepSeek最新模型打破数学推理局限,部分性能超越Gemini DeepThink
Tai Mei Ti A P P· 2025-11-28 05:45
Core Insights - DeepSeek has released its latest mathematical model, DeepSeek Math-V2, which has generated significant excitement in the AI community due to its self-verifying capabilities in deep reasoning, particularly in mathematics [1][2]. Model Performance - Math-V2 demonstrates strong theorem-proving abilities, distinguishing itself from previous models that merely solved problems without rigorous reasoning [2]. - The model achieved gold medal-level results in the IMO 2025 and CMO 2024 competitions, and scored 118 out of 120 in the Putnam 2024 competition, showcasing its superior performance [2]. Benchmarking Results - In the IMO-Proof Bench evaluation, Math-V2 scored 99%, outperforming Google's Gemini Deep Think (89%) and GPT-5 (59%) [3]. - In advanced testing, Math-V2 scored 61.9%, just behind Gemini Deep Think's 65.7% [3]. Community Impact - The release of Math-V2 has sparked discussions across social media platforms and communities, highlighting its potential to automate verification-heavy tasks in programming languages [5][8]. - Experts in the AI field have praised DeepSeek's return and the significance of Math-V2, indicating a shift from "chatbot" to "reasoner" era in AI development [8][9].
GPT-5危了,DeepSeek开源世界首个奥数金牌AI,正面硬刚谷歌
3 6 Ke· 2025-11-28 01:55
Core Insights - DeepSeek has launched its new model, DeepSeekMath-V2, which has won the IMO 2025 gold medal, showcasing capabilities that rival or even surpass Google's IMO gold medal model [1][3][22] - This is the first open-source IMO gold medal model, marking a significant advancement in AI [1][24] Model Performance - DeepSeekMath-V2 demonstrated strong theorem-proving abilities, solving 5 out of 6 problems in the IMO 2025, achieving a gold medal level [3][4] - In the CMO 2024, it also reached gold medal status, and in the Putnam 2024, it scored 118 out of 120, surpassing the highest human score of 90 [3][4] Comparison with Competitors - DeepSeekMath-V2 outperformed Google's Gemini Deep Think in the ProofBench-Basic tests and closely followed it in the ProofBench-Advanced tests [5][22] - The model's performance indicates a significant leap in capabilities compared to existing models like OpenAI's GPT-5 and Gemini 2.5-Pro [26][28] Self-Verification Mechanism - A key breakthrough of DeepSeekMath-V2 is its self-verification capability, allowing it to self-assess and improve its proofs [12][36] - The model employs a unique "three-in-one" system consisting of a Generator, Verifier, and Meta-Verifier to enhance its proof quality [15][16] Training Methodology - The training process involved a high-compute search strategy, generating numerous candidate proofs and validating them rigorously [32][35] - The model's ability to self-correct and refine its proofs through multiple iterations significantly improved its performance [38] Implications for AI Development - The success of DeepSeekMath-V2 suggests a shift in AI from merely mimicking human responses to emulating human thought processes, emphasizing the importance of self-reflection in achieving advanced AI [36][37]
DeepMind发布代码修复AI智能体CodeMender,实现「被动响应」与「主动防御」一体化
机器之心· 2025-10-07 07:00
Core Viewpoint - The article discusses the introduction of CodeMender, an AI agent developed by DeepMind, designed to automatically repair critical software vulnerabilities while ensuring that the fixes do not introduce new issues, emphasizing the importance of rigorous validation in AI-driven code security solutions [2][10]. Group 1: CodeMender Overview - CodeMender employs a comprehensive approach to address software vulnerabilities, balancing both passive response and proactive defense by immediately patching new vulnerabilities and rewriting existing code to eliminate systemic flaws [4]. - In the past six months, DeepMind has uploaded 72 security patches to open-source projects, with some patches encompassing up to 4.5 million lines of code [5]. - By automating the creation and application of high-quality security patches, CodeMender allows developers to focus on building quality software rather than spending time on vulnerability detection [6]. Group 2: Developer Reactions - The release of CodeMender has sparked discussions among developers, with some highlighting its ability to ensure that fixes do not disrupt other functionalities, marking a significant advancement in automation [8]. - Concerns have been raised that CodeMender could potentially disrupt income streams related to quality assurance, security audits, and bug bounty programs [8]. Group 3: AI Vulnerability Reward Program - Google has recently launched a reward program specifically targeting vulnerabilities in AI products, with bug hunters having earned over $430,000 since the initiative began two years ago [9]. Group 4: CodeMender's Mechanism - CodeMender operates using the latest Gemini deep thinking model, enabling it to automatically debug and repair complex vulnerabilities while ensuring that modifications are logically sound and do not cause additional problems [12]. - The agent utilizes a variety of tools, including debuggers and source code browsers, to accurately identify root causes and design patches [14]. - Advanced program analysis techniques, such as static and dynamic analysis, are employed to systematically examine code patterns and identify vulnerabilities [18]. Group 5: Case Studies - In one case, CodeMender identified a root cause related to stack management in XML parsing, leading to a patch that modified only a few lines of code [15]. - Another instance showcased CodeMender's ability to create a non-trivial patch addressing complex object lifecycle issues, demonstrating its capability to enhance security by rewriting existing code [17]. Group 6: Future Developments - All patches generated by CodeMender undergo human review before submission to upstream projects, ensuring reliability and quality [24]. - DeepMind plans to share further technical papers and reports in the coming months, with the goal of eventually making CodeMender available as a tool for all developers to enhance software security [24].
AI拿下奥数IMO金牌,但数学界的AlphaGo时刻还没来
3 6 Ke· 2025-08-01 02:40
Group 1 - The core event of the 2025 International Mathematical Olympiad (IMO) was marked by AI achieving gold medal standards, with OpenAI and DeepMind both announcing scores of 35 out of 42, indicating a significant leap in AI's mathematical reasoning capabilities [1][4][8] - The competition between OpenAI and DeepMind intensified, highlighted by DeepMind's criticism of OpenAI for prematurely announcing results, and the subsequent poaching of key DeepMind researchers by Meta [3][9][12] - The IMO gold medal results, while impressive, do not yet signify that AI has surpassed human capabilities in mathematics, as 72 high school students also achieved gold standards, with five scoring perfect 42s [12][30] Group 2 - The achievement of AI in the IMO serves as a benchmark for evaluating AI's reasoning abilities, with previous models like AlphaGeometry and AlphaProof only reaching silver standards [13][16] - DeepMind's Gemini Deep Think model demonstrated a significant advancement by solving problems using natural language without relying on formal proof systems, challenging previous assumptions about AI's reasoning capabilities [18][20] - The differing approaches of OpenAI and DeepMind in solving problems were noted, with OpenAI using more computational methods while DeepMind's approach was more aligned with human problem-solving techniques [22][23] Group 3 - The implications of AI's performance in the IMO are debated within the academic community, with some experts believing AI can assist mathematicians by generating insightful prompts and ideas [34][40] - Conversely, skepticism exists regarding AI's role in mathematics, with concerns that it may reduce the discipline to a mere technical product, undermining the creative and exploratory nature of mathematical research [36][39] - The ongoing discourse highlights a divide in the mathematical community about the potential benefits and drawbacks of AI in research, emphasizing the need for deeper discussions on the purpose and implications of AI in mathematics [36][40]
不怕被挖!谷歌晒IMO金牌团队大合照,还挨个圈出了联系方式
量子位· 2025-07-25 07:59
Core Viewpoint - Google DeepMind is actively responding to competitive pressures, particularly from Meta, as it prepares for the International Mathematical Olympiad (IMO) 2025, showcasing its team and achievements despite recent talent losses to competitors [2][3][4]. Group 1: Team Dynamics and Competitor Actions - Google recently won an IMO gold medal, but Meta quickly recruited three core team members from DeepMind [2][3]. - The DeepMind team, led by Thang Luong, publicly shared a team photo, which can be seen as both a response to Meta's actions and a display of confidence [3][4]. - Notably, the three individuals recruited by Meta were absent from the team photo, indicating a potential rift or shift in team dynamics [8][17]. Group 2: Preparation for IMO - In the lead-up to the IMO 2025, DeepMind's scientists gathered from various locations, including Mountain View, New York, and Singapore, to finalize their preparations [11]. - Thang Luong emphasized that the week leading up to the competition was crucial for achieving significant breakthroughs [11]. - The team integrated their previous research and methodologies to conduct an intensive training session, which was described as a "legendary" effort [10][11]. Group 3: Technical Achievements - The team completed the final training of the Gemini Deep Think model just two days before the IMO, achieving peak performance [13]. - The model demonstrated impressive capabilities not only in mathematical reasoning but also in code generation and other complex reasoning tasks [14]. Group 4: Key Team Members - The recently announced IMO gold medal team consists of 16 members, including four Chinese members, while the three who left for Meta are not included [17]. - Yi Tay, a co-leader of the Deep Think IMO team, has a strong background in major Google models and previously left to start a company but returned due to personal circumstances [21][25]. - Other notable team members include Quoc Le, a co-founder of Google Brain, and several researchers with prestigious academic backgrounds from institutions like MIT and Stanford [27][29][41].
全球首个IMO金牌AI诞生!谷歌Gemini碾碎奥数神话,拿下35分震惊裁判
首席商业评论· 2025-07-23 04:02
Core Viewpoint - Google DeepMind has officially announced its achievement of winning a gold medal at the International Mathematical Olympiad (IMO) with its Gemini Deep Think model, scoring 35 out of a possible 42 points, thus meeting the gold medal standard within 4.5 hours [1][3][4][22]. Group 1: Achievement Details - Gemini Deep Think is a general model that successfully solved the first five problems of the IMO, earning a score of 35 [3][22]. - The model completed the tasks using pure natural language (English), which is a significant advancement compared to previous AI models [5][25]. - This achievement is officially recognized by the IMO organizing committee, marking it as the first AI system to receive such an acknowledgment [6][7]. Group 2: Competition Context - The IMO, held annually since 1959, is a prestigious competition that attracts top students globally, with only the top 8% of participants earning gold medals [10][12]. - The competition requires participants to solve six complex mathematical problems within a 4.5-hour timeframe, testing not only logical reasoning but also creative thinking and rigor [11][15]. Group 3: Technical Innovations - Gemini Deep Think utilized an advanced reasoning mode that allows for parallel thinking, enabling the model to explore multiple problem-solving paths simultaneously [29][30]. - The model was trained using novel reinforcement learning techniques, enhancing its capabilities in multi-step reasoning and theorem proving [33][94]. - The combination of training, knowledge base, and strategic approaches contributed to Gemini's outstanding performance at the IMO [33]. Group 4: Future Implications - Google DeepMind aims to further develop AI that can tackle more complex mathematical problems, believing that AI will become an indispensable tool for mathematicians, scientists, engineers, and researchers [76][78]. - The success of Gemini Deep Think at the IMO highlights the potential for AI to contribute significantly to the field of mathematics [76][78].
AI首夺数学奥赛金牌!谷歌Gemini闪耀IMO赛场 OpenAI同步“摘金”
智通财经网· 2025-07-22 13:28
Group 1 - Alphabet's Google announced that its AI model, Gemini Deep Think, achieved a gold medal in the International Mathematical Olympiad (IMO) by solving five out of six problems, scoring 35 out of a possible 42 points [1][2] - The model demonstrated end-to-end reasoning in natural language, providing rigorous mathematical proofs within the 4.5-hour competition time limit [1] - Last year, Google's DeepMind's AlphaProof and AlphaGeometry 2 system only achieved a silver medal, solving four problems and scoring 28 points [1] Group 2 - OpenAI also claimed that its experimental reasoning model reached gold medal status in the IMO, solving five out of six problems and scoring 35 points [2][3] - The evaluation was conducted under the same conditions as human participants, with two exams of 4.5 hours each, without the use of tools or the internet [2] - This marks the first time an AI system has crossed the gold medal scoring threshold in a competition aimed at high school students [3]
Altman 秀新模型“翻车”,谷歌补刀躺赢!OpenAI 前员工爆肝3天,编程再赢老东家模型!
AI前线· 2025-07-22 09:32
Core Viewpoint - OpenAI has recently announced new AI models that have achieved significant milestones in competitive mathematics, sparking debate over the legitimacy of their claims compared to competitors like Google DeepMind [1][4]. Group 1: OpenAI's Achievements - OpenAI claims that one of its new AI models achieved a gold medal level in the International Mathematical Olympiad (IMO), a feat accomplished by less than 9% of human participants [2][3]. - The model adhered to the same constraints as human competitors, completing six proof-based problems within a 4.5-hour time limit without internet access or calculators [3]. - OpenAI's announcement of its achievements was made before the official results were released, leading to criticism and questions about the validity of its claims [4][12]. Group 2: Competitor Responses - Google DeepMind's model, Gemini Deep Think, reportedly solved five out of six problems in the IMO, previously claiming a silver medal in a prior competition [2]. - DeepMind's CEO criticized OpenAI for prematurely announcing its results, emphasizing the importance of adhering to the IMO's confidentiality agreements [4][12]. - The IMO organizers have a set of official scoring standards that have not been publicly disclosed, raising concerns about the legitimacy of OpenAI's self-assessment [4]. Group 3: New Model Developments - OpenAI is testing a new model named "o3 Alpha," which has shown promising capabilities in web development tasks [5][8]. - The model was briefly available for testing and is expected to be officially released in the coming weeks, with indications that it may be a precursor to the anticipated GPT-5 [8]. - OpenAI's CEO hinted at the existence of a highly capable programming model that could rank among the top 50 programmers globally, suggesting significant advancements in AI capabilities [8]. Group 4: Competitive Programming Context - In a recent programming competition, an OpenAI model named "OpenAIAHC" secured second place, demonstrating the increasing competitiveness of AI in programming contests [10][13]. - The competition format allowed AI and human participants to compete directly, highlighting the potential future challenges for human programmers as AI continues to evolve [13].