Workflow
Gemini 2.5 Deep Think
icon
Search documents
陶哲轩亲测,GPT-5 Pro 40分钟破解3年难题,登顶最难数学考试
3 6 Ke· 2025-10-13 00:31
Core Insights - The article discusses the capabilities and limitations of AI, specifically GPT-5 Pro, in solving complex mathematical problems, highlighting the distinction between computational ability and true understanding [1][2][34]. Group 1: AI Performance in Mathematics - GPT-5 Pro achieved a score of 13% on the challenging FrontierMath test set, indicating strong computational skills but limited understanding of deeper mathematical concepts [2][32]. - The AI demonstrated proficiency in handling structured and symbolic problems but struggled with geometric constructions and problems requiring intuition [40][41]. Group 2: Real-World Testing by Mathematician - Mathematician Terence Tao tested GPT-5 Pro with an unsolved problem in differential geometry, seeking to explore the AI's ability to generate new ideas in unfamiliar areas [5][6][7]. - The AI successfully generated a reasoning chain for simpler cases but failed to maintain accuracy when the problem was slightly altered, revealing its tendency to reinforce incorrect paths [14][15]. Group 3: Insights Gained from AI Interaction - Tao noted that the AI's performance helped him understand the problem better, not because it solved it, but because it illuminated the reasons for its failure [16][17]. - The experiment highlighted the importance of human intuition and situational awareness in research, suggesting that while AI can assist in calculations, it lacks the ability to grasp the broader context [44][45]. Group 4: Implications for Future Research - The article emphasizes the need for a balance between automation and human oversight in research, as excessive reliance on AI could lead to a decline in critical thinking and understanding [38][39]. - The distinction between AI's linear intelligence and human's topological understanding suggests a new division of labor in mathematics, where AI serves as a computational engine while humans focus on structural design and meaning [45][46].
谷歌与OpenAI同获ICPC 2025金牌!GPT-5满分夺冠,Gemini攻破人类队伍都没解出的难题
AI科技大本营· 2025-09-19 10:36
GPT-5 和 Gemini 2.5 Deep Think 作为参赛模型,受 ICPC 官方规则与组织监督,参与了与人类选手相同的解题环节。虽然它们并非与学生团队直接同 场竞技,却交出了惊艳答卷: ● GPT-5 拿下满分,12 道题全解,相当于"金牌"水准。 ● Gemini 2.5 Deep Think 在 677 分钟内解出 12 题中的 10 题 ,也达到金牌级别。 根据谷歌的说法,这样的成绩放在人类排名里将是全球第二。 整理 | 郑丽媛 出品 | CSDN(ID:CSDNnews) 在过去几十年里,国际大学生程序设计竞赛(ICPC)一直被视为 计算机 程序设计 领域的"奥林匹克"。然而今 年 ,赛场上的风头却被两位"非人类"选手 抢走——OpenAI 的 GPT-5 和 Google DeepMind 的 Gemini 2.5 DeepThink。 要知道,本届 ICPC 的人类金牌队伍来自圣彼得堡国立大学、东京大学、北京交通大学和清华大学。 可 即便是这些顶尖学府的强 队,也没有任何一支 做到全对(最好成绩是 11/12)。换句话 说, 这是 AI 第一次在这类算法竞赛中实现了"超车" 。 ICP ...
OpenAI在ICPC 2025编程赛上满分登顶,Gemini也达到金牌水平
3 6 Ke· 2025-09-18 09:50
Core Insights - OpenAI and Gemini both achieved gold medal levels at the ICPC 2025, with OpenAI solving all 12 problems in 5 hours, outperforming all human teams [1][6] - Gemini solved 10 out of 12 problems in 677 minutes, ranking second among human teams [3][20] Group 1: Competition Overview - The ICPC World Finals took place on September 4 in Baku, Azerbaijan, featuring top teams from early competition stages [6] - A total of 139 teams participated, with only the top four teams receiving gold medals based on perfect solutions and time efficiency [6] Group 2: Performance Comparison - The top human team, from St. Petersburg State University, solved 11 problems in 1478 minutes, while OpenAI solved all 12 in 300 minutes [5][7] - Gemini's performance included solving 8 problems in 45 minutes and the remaining 2 in the following 3 hours [20] Group 3: AI Capabilities - OpenAI's AI system, comprising a general reasoning model, solved 11 problems accurately on the first attempt, with the final problem requiring 9 attempts [12][7] - Gemini utilized advanced data structures and algorithms to solve problems, demonstrating its capability in complex reasoning tasks [20][28] Group 4: Implications for AI - The success of AI in ICPC highlights its potential to provide innovative solutions and assist in complex reasoning, marking a shift from mere information processing to problem-solving capabilities [35]
刚刚,OpenAI在ICPC 2025编程赛上满分登顶,Gemini也达到金牌水平
机器之心· 2025-09-18 04:32
机器之心报道 编辑:杨文、+0 IMO 之后,OpenAI 与 Gemini 双双加冕 ICPC 2025 金牌。 就在刚刚,OpenAI 和 Gemini 都声称达到了 ICPC 金牌水平。 其中,OpenAI 在 5 个小时内解决了所有 12 个问题,相当于人类排名第 1 位,超过了所有参赛大学团队。 而 Gemini 解决了 12 个问题中的 10 个,总用时 677 分钟,达到了金牌水平,如果与人类团队比较,将排名第 2。 人类团队方面,俄罗斯圣彼得堡国立大学的参赛队伍排名第 1,解决了 11 个问题。北京交通大学、清华大学、北京大学、中国科学技术大学的参赛队伍分别排名 2、4、5、9。 | Rank | Name | Solved Time | | A | B | C | D | E | | G | H | I | 2 | K | L | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | 91 St. Petersburg State Univ ...
ICPC总决赛被AI统治,GPT-5组合系统12题全对登顶,人类打破头只能争夺第三
3 6 Ke· 2025-09-18 01:56
这届大学生太难了,好不容易拼进编程竞赛总决赛,还要被AI秀一脸。 在刚刚结束的2025年国际大学程序设计竞赛(ICPC)世界总决赛上,OpenAI的系统完美解决全部12道题目,若计入排名将位居第一。 谷歌的Gemini 2.5 Deep Think模型解决10道题目,达到金牌水准名列第二。 这场顶级赛事汇集了来自全球103个国家、近3000所大学的139支顶尖队伍。 而AI系统在ICPC官方监督的独立"AI实验赛道"中,与人类选手面对相同题目和评测标准,表现非常抢眼。 其中比较难的一道"问题C",没有一个大学团队能够解决,Gemini和OpenAI的模型组合都解决了。 | Rank | Name | Solved Time | | A | B | C | D | 를 | E | G | H | I | J | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | 91 St. Petersburg State University | 11 | 1478 | 2/255 | 2/53 ...
刚刚,OpenAI/Gemini共斩ICPC 2025金牌,OpenAI满分碾压横扫全场
3 6 Ke· 2025-09-18 01:55
真是疯狂! 刚刚,谷歌和OpenAI同时拿下ICPC金牌,尤其OpenAI还是满分! ICPC全称国际大学生程序设计竞赛,是世界上最负盛名的编程竞赛之一! 规则是在五个小时内,求解十几个极其复杂的编程和算法难题! 最终,Gemini成功解答了12道题目中的10道,荣获金牌。 OpenAI则全部解答正确,获得满分,拿下金牌! 人类呢? 139支人类参赛队伍中,只有3支队伍取得了和Gemini 10/12一样的成绩,没有人类队伍获得满分。 其中和Gemini战平的唯一中国队伍,是北交大,我们在ICPC全球总决赛放榜的第一时刻也做了深入报道,解析了这支中国最强战队是如何炼成的。 力压哈佛MIT!北交大、清华勇夺2025国际大学生程序设计竞赛金牌 尤其是,谷歌也特地提到,问题C所有人类队伍都没有解答出来,而谷歌Gemini在半个小时内成功求解! OpenAI则是解决了所有问题,拿下满分! 真的是令人震撼的时刻,历史性的一夜,AI在最顶级的编程比赛中彻底的超过了人类! | Asia East Standings | Latin America Standings | Africa and Arab Standings ...
ICPC总决赛被AI统治!GPT-5组合系统12题全对登顶,人类打破头只能争夺第三
量子位· 2025-09-18 00:51
这届大学生太难了,好不容易拼进编程竞赛总决赛,还要被AI秀一脸。 在刚刚结束的2025年国际大学程序设计竞赛(ICPC)世界总决赛上, OpenAI 的系统完美解决全部12道题目,若计入排名将 位居第一 。 谷歌 的Gemini 2.5 Deep Think模型解决10道题目,达到金牌水准 名列第二 。 这场顶级赛事汇集了来自全球103个国家、近3000所大学的139支顶尖队伍。 而AI系统在ICPC官方监督的独立"AI实验赛道"中,与人类选手面对相同题目和评测标准,表现非常抢眼。 梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 其中比较难的一道 "问题C" ,没有一个大学团队能够解决,Gemini和OpenAI的模型组合都解决了。 | Rank Name | Solved Time | | A | B | C | D | 트 | E | G | H | I | 기 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 81 St. Petersburg State University | ...
X @Demis Hassabis
Demis Hassabis· 2025-09-17 17:44
RT Google DeepMind (@GoogleDeepMind)An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the ICPC 2025 - one of the world’s most prestigious programming contests. 🏅Building on the model's success in math at the IMO, this marks another historic milestone for advanced AI. 🧵 ...
腾讯研究院AI速递 20250915
腾讯研究院· 2025-09-14 16:01
Group 1 - OpenAI and Microsoft have released a non-binding cooperation memorandum addressing key issues such as cloud service hosting, intellectual property ownership, and AGI control, but the final cooperation agreement is still pending [1] - OpenAI plans to establish a public benefit corporation (PBC) with a valuation exceeding $100 billion, where a non-profit organization will hold equity and maintain control, becoming one of the most resource-rich charitable organizations globally [1] - OpenAI faces significant cost pressures, expecting to burn through $115 billion before 2029, with $100 billion needed for server leasing in 2030, leaving little room for error in the coming years [1] Group 2 - Utopai, the world's first AI-native film studio founded by a former Google X team, has generated $110 million in revenue from two film projects and secured a spot at the Cannes Film Festival [2] - Utopai has overcome three major challenges in AI video generation: consistency, controllability, and narrative continuity, achieving millisecond-level lip-sync precision with 3D data training [2] - The company positions itself as a content + AI provider rather than a pure tool supplier, receiving support from top Hollywood resources, including an Oscar-nominated screenwriter for the film "Cortes" [2] Group 3 - MiniMax has launched its new music generation model, Music 1.5, capable of creating complete songs up to 4 minutes long, featuring strong control, natural-sounding vocals, rich arrangements, and clear song structure [3] - The model supports customizable music features across "16 styles × 11 emotions × 10 scenes," enabling the generation of different vocal tones and the inclusion of Chinese traditional instruments [3] - MiniMax's multi-modal self-developed capabilities are now available to global developers via API, applicable in various scenarios such as professional music creation, film and game scoring, and brand-specific audio content [3] Group 4 - Meituan's first AI Agent product, "Xiao Mei," has entered public testing, allowing users to order coffee, find restaurants, and plan breakfast menus through natural language commands, significantly simplifying the ordering process [4] - "Xiao Mei" is based on Meituan's self-developed Longcat model (with 560 billion total parameters), capable of fully automating the selection to payment process based on user preferences and location [4] - Despite the advancements, the AI Agent currently has limitations, such as handling complex ambiguous requests and lacking voice response capabilities, with plans for future optimization in personalization and proactive service [4] Group 5 - Xiaohongshu's audio technology team has released the next-generation dialogue synthesis model, FireRedTTS-2, addressing issues like poor flexibility, frequent pronunciation errors, unstable speaker switching, and unnatural prosody [5][6] - The model has been trained on millions of hours of voice data, supporting sentence-by-sentence generation and multi-speaker tone switching, capable of mimicking voice tones and speaking habits from a single audio sample [6] - FireRedTTS-2 has achieved industry-leading levels in both subjective and objective evaluations, supporting multiple languages including Chinese, English, and Japanese, and serves as an industrial-grade solution for AI podcasting and dialogue synthesis applications [6] Group 6 - Bilibili has open-sourced its new zero-shot voice synthesis model, IndexTTS2, addressing industry pain points by achieving millisecond-level precise duration control for AI dubbing [7] - The model employs a "universal and compatible autoregressive architecture for voice duration control," achieving a duration error rate of 0.02%, and utilizes a two-stage training strategy to decouple emotion and speaker identity [7] - The system consists of three core modules: T2S (text to semantics), S2M (semantics to mel-spectrogram), and BigVGANv2 vocoder, allowing for emotional control in a straightforward manner, with significant implications for cross-language industry applications [7] Group 7 - Meta AI has released the MobileLLM-R1 series of small parameter-efficient models, including sizes of 140M, 360M, and 950M, optimized for mathematics, programming, and scientific questions [8] - The largest 950M model was pre-trained using approximately 2 trillion high-quality tokens (with a total training volume of less than 5 trillion), achieving performance comparable to or better than the Qwen3 0.6B model trained on 36 trillion tokens [8] - The model outperforms Olmo 1.24B by five times and SmolLM2 1.7B by two times on the MATH benchmark, demonstrating high token efficiency and cost-effectiveness, setting a new benchmark among fully open-source models [8] Group 8 - An AI agent named "Gauss" completed a mathematical challenge that took Terence Tao's team 18 months to solve, formalizing the strong prime number theorem (PNT) in Lean in just three weeks [9] - Developed by a company founded by Christian Szegedy, an author of the ICML'25 time verification award, Gauss generated approximately 25,000 lines of Lean code, including thousands of theorems and definitions [9] - Gauss can assist top mathematicians in formal verification, breaking through core challenges in complex analysis, with plans to increase the total amount of formalized code by 100 to 1,000 times in the next 12 months [9] Group 9 - Sequoia Capital USA has interpreted the new AI landscape following the release of GPT-5 by OpenAI, which allows for a more natural interaction resembling conversations with a PhD-level expert, incorporating "thinking" capabilities and a unified model to reduce hallucinations [10][11] - Other players have also launched strategic new products ahead of the release, including Anthropic's Claude Opus 4.1 targeting high-risk enterprise scenarios and Google's Gemini 2.5 Deep Think and Genie 3 enhancing reasoning and simulation capabilities [10][11] - The new AI landscape has been reshaped, with OpenAI dominating both open and closed AI ecosystems, Anthropic focusing on enterprise-level precision and stability, and Google emphasizing long-term foundational research [11] Group 10 - DeepMind's science lead, Pushmeet Kohli, revealed that the team targets three types of problems: transformative challenges, those recognized as unsolvable in 5-10 years, and those that DeepMind is confident it can quickly tackle [12] - The team has successfully transferred capabilities from specialized models like AlphaProof to the Gemini general model, achieving International Mathematical Olympiad gold medal levels with DeepThink [12] - The future goal is to create a "scientific API" that allows global scientists to share AI capabilities, lowering research barriers and enabling ordinary individuals to contribute to Nobel-level achievements [12]
喝点VC|红杉美国解读GPT-5后AI产业版图新格局:全新的AI交互范式产生,AI时代的加速发展拐点已到
Z Potentials· 2025-09-14 06:14
Z Highlights Sequoia Capital 旗下的 Inference 专栏由其风险投资团队与 AI 工具协同撰写,坚持 "AI+ 人类编辑 " 的产出模式。在 AI 领域,他们密切跟踪最新模型趋 势,分析行业发展脉络,并提供深度洞见,挖掘 AI 在应用层及未来发展的突破路径。本篇文章发表于 2025 年 8 月 8 日。 往年的八月通常平静无波。然而在 2025 年,一周之内,全球最顶尖的 AI 实验室 ——OpenAI 、 Google 和 Anthropic—— 几乎同时掀起了一场密集而狂热 的发布潮,一系列重磅模型接连亮相,合力重绘了整个 AI 产业的版图。 虽然每一次发布都意义重大,但其中有一次发布无疑高于其他,这不仅是一项技术 的迭代升级,更是行业发展的真正拐点。 重磅时刻: GPT-5 面向所有人正式发布 OpenAI 自信地宣称, GPT-5 在编程、写作和医疗领域均为 " 全球最优 " 。 发布会上展示了所谓的 "vibe coding" :模型仅通过一次自然对话,就能在数 分钟内生成一个完整可用的法语学习网页应用,并实时可视化伯努利效应的教学示例。 这场史无前例的发布周在 O ...