Gemini 1.5 Pro

Search documents
X @Demis Hassabis
Demis Hassabis· 2025-08-14 01:17
RT Google Gemini App (@GeminiApp)We’re introducing a new setting that allows Gemini to learn from your past conversations over time.When this setting is on, Gemini remembers key details and preferences you've shared, leading to more natural and relevant conversations, as if you're collaborating with a partner who's already up to speed. Rolling out to 2.5 Pro users today and will expand to 2.5 Flash soon. ...
“我没错”GPT-4o嘴硬翻车,AI在黑天鹅事件面前集体宕机
3 6 Ke· 2025-07-16 11:19
Core Insights - A joint research team from Columbia University, Vector AI Research Institute, and Nanyang Technological University found significant deficiencies in AI models' reasoning capabilities when handling unexpected events, with top models like GPT-4o and Gemini 1.5 Pro performing up to 32% worse than humans [2][14][15] Group 1: Research Findings - The study titled "Black Swan" highlights a fundamental issue in current AI evaluation methods, which primarily focus on predictable and clear visual scenarios, neglecting the unpredictable nature of real-world events [4][14] - Two core reasoning abilities essential for humans to handle unexpected situations are identified: abductive reasoning (inferring the most likely explanation from limited observations) and defeasible reasoning (revising initial conclusions based on new evidence) [5][14] Group 2: New Benchmark Testing - To accurately assess AI's reasoning capabilities in unexpected situations, the research team developed a new benchmark test called "BlackSwanSuite," consisting of 1,655 videos depicting various unconventional real-life scenarios [8][11] - The benchmark includes three core tasks: "Forecaster," where models predict future events from the beginning of a video; "Detective," where models infer missing information from the start and end of a video; and "Reporter," where models describe the entire event and reassess previous judgments based on complete information [11][12] Group 3: Performance Comparison - Top AI models, including GPT-4o and Gemini 1.5 Pro, significantly lag behind humans across all three tasks, with the best models trailing humans by up to 25% in multiple-choice questions and 32% in true/false judgments [14][15] - In the "Detective" task, GPT-4o's accuracy was 24.9% lower than that of humans, while in the "Reporter" task, the gap reached 32%, indicating that AI struggles to correct initial misjudgments [14][15][16] Group 4: Implications of Findings - The research indicates that AI models often "lock in" their initial judgments and fail to update their reasoning based on new evidence, which poses significant risks in critical applications like autonomous driving [17][20] - An experiment showed that when AI models were provided with human-written descriptions of video content, their reasoning accuracy improved by up to 10%, suggesting that the core shortcoming lies not only in advanced reasoning but also in basic perception and understanding capabilities [25][26]
过度炒作+虚假包装?Gartner预测2027年超40%的代理型AI项目将失败
3 6 Ke· 2025-07-04 10:47
Core Insights - The emergence of "Agentic AI" is gaining attention in the tech industry, with predictions that 2025 will be the "Year of AI Agents" [1][9] - Concerns have been raised about the actual capabilities and applicability of Agentic AI, with many projects potentially falling into the trap of concept capitalization rather than delivering real value [1][2] Group 1: Current State of Agentic AI - Gartner predicts that by the end of 2027, over 40% of Agentic AI projects will be canceled due to rising costs, unclear business value, or insufficient risk control [1][10] - A survey by Gartner revealed that 19% of organizations have made significant investments in Agentic AI, while 42% have made conservative investments, and 31% are uncertain or waiting [2] Group 2: Misrepresentation and Challenges - There is a trend of "agent washing," where existing AI tools are rebranded as Agentic AI without providing true agent capabilities; only about 130 out of thousands of vendors actually offer genuine agent functions [2][3] - Most current Agentic AI solutions lack clear business value or return on investment (ROI), as they are not mature enough to achieve complex business goals [3][4] Group 3: Performance Evaluation - Research from Carnegie Mellon University indicates that AI agents have significant gaps in their ability to replace human workers in real-world tasks, with the best-performing model, Gemini 2.5 Pro, achieving only a 30.3% success rate in task completion [6][7] - In a separate evaluation for customer relationship management (CRM) scenarios, leading models showed limited performance, with single-turn interactions averaging a 58% success rate, dropping to around 35% in multi-turn interactions [8] Group 4: Industry Reactions and Future Outlook - Companies like Klarna have experienced setbacks with AI tools, leading to a return to human employees for customer service due to quality issues [9] - Despite current challenges, Gartner remains optimistic about the long-term potential of Agentic AI, forecasting that by 2028, at least 15% of daily work decisions will be made by AI agents [10]
斯坦福临床医疗AI横评,DeepSeek把谷歌OpenAI都秒了
量子位· 2025-06-03 06:21
Core Insights - The article discusses the comprehensive evaluation of large language models (LLMs) for medical tasks, highlighting that DeepSeek R1 achieved a 66% win rate, outperforming other models in a clinical context [1][7][24]. Evaluation Framework - A comprehensive assessment framework named MedHELM was developed, consisting of 35 benchmark tests covering 22 subcategories of medical tasks [12][20]. - The classification system was validated by 29 practicing clinicians from 14 medical specialties, ensuring its relevance to real-world clinical activities [4][17]. Model Performance - DeepSeek R1 led the evaluation with a 66% win rate and a macro average score of 0.75, indicating its superior performance across the benchmark tests [7][24]. - Other notable models included o3-mini with a 64% win rate and Claude 3.7 Sonnet with a 64% win rate, while models like Gemini 1.5 Pro ranked lowest with a 24% win rate [26][27]. Benchmark Testing - The evaluation included 17 existing benchmarks and 13 newly developed tests, with 12 of the new tests based on real electronic health record data [21][20]. - The models showed varying performance across different task categories, with higher scores in clinical case generation and patient communication tasks compared to structured reasoning tasks [32]. Cost-Effectiveness Analysis - A cost analysis was conducted based on the token consumption during the evaluation, revealing that non-reasoning models like GPT-4o mini had lower costs compared to reasoning models like DeepSeek R1 [38][39]. - The analysis indicated that models like Claude 3.5 Sonnet and Claude 3.7 Sonnet provided good value for their performance at lower costs [39].
胡泳:超级能动性——如何将人类潜能提升到新高度
3 6 Ke· 2025-05-28 11:54
Group 1 - The core idea is that AI, like the internet decades ago, is at the beginning of a transformative phase that could significantly enhance human productivity and creativity through human-machine collaboration [2][3][4] - AI is seen as a "super-empowerment" tool that can amplify human capabilities, enabling individuals to achieve unprecedented levels of creativity and productivity [4][5] - The historical context of transformative technologies suggests that while initial reactions may be pessimistic, the long-term impacts can be overwhelmingly positive [3][4] Group 2 - AI is evolving beyond mere task automation to include cognitive functions such as reasoning, planning, and decision-making, which could reshape human interactions with technology [6][8] - Recent advancements in AI, particularly in large language models (LLMs), have shown significant improvements in reasoning capabilities, allowing them to perform well on standardized tests [7][8] - The emergence of agentic AI, which can autonomously take actions and make decisions, represents a significant leap in AI's capabilities, potentially transforming it into a digital workforce [9][10] Group 3 - Multi-modal AI is advancing, integrating various data types (text, audio, video) to enhance understanding and interaction, which could lead to broader applications across industries [11][13] - Hardware innovations, such as specialized chips, are driving AI performance improvements, enabling faster and more efficient processing of complex tasks [14][15] - Transparency and interpretability in AI are becoming increasingly important for safe deployment, with ongoing improvements in model transparency scores [16][17] Group 4 - The potential for AI to drive revenue growth is significant, with nearly 90% of business leaders anticipating positive impacts from AI deployment, although many transformations face challenges [18][19] - Key challenges in AI transformation include leadership alignment, cost uncertainty, workforce planning, supply chain management, and the need for greater interpretability [19][20][21] - Companies are encouraged to adopt a strategic approach to AI, focusing on human agency and iterative deployment to foster innovation and address potential risks [22][24]
胡泳:超级能动性——如何将人类潜能提升到新高度
腾讯研究院· 2025-05-28 08:34
Core Insights - The article emphasizes that AI, like the internet decades ago, is at the beginning of a transformative phase that could redefine human productivity and creativity, leading to a state of "super agency" where humans and machines collaborate effectively [1][4][5]. Group 1: AI's Transformative Potential - AI is seen as a powerful tool that can enhance human capabilities, acting as a "force multiplier" rather than just a tool [4][5]. - The concept of "super agency" describes how individuals can leverage AI to significantly boost their creativity, productivity, and influence [5]. - AI is expected to democratize knowledge acquisition and automate numerous tasks, provided it is developed and deployed safely and equitably [5][7]. Group 2: Historical Context and Public Perception - Historical technological advancements often faced initial skepticism, with concerns about their negative impacts overshadowing their potential benefits [3]. - The narrative around AI is influenced by dystopian themes, yet there is a call to reframe this perspective to envision positive outcomes [3][4]. Group 3: AI's Advancements and Capabilities - AI is evolving to automate cognitive functions, enabling it to adapt, plan, and make decisions autonomously, which could drive unprecedented economic growth and social change [7][8]. - Significant advancements in AI, such as large language models (LLMs), have shown remarkable performance in standardized tests, indicating a leap in reasoning capabilities [8][9]. Group 4: Autonomous AI and Its Implications - Agentic AI is emerging, capable of independent action and complex task execution, marking a shift from passive tools to proactive digital partners [11][12]. - Companies are integrating agentic AI into their core products, enhancing collaboration between humans and automated systems [13]. Group 5: Multi-modal AI Development - Current AI models are advancing towards multi-modal capabilities, processing various data types (text, audio, video) simultaneously, which enhances understanding and interaction [14][15]. - Self-supervised learning techniques are being utilized to improve multi-modal models, allowing them to learn from unlabelled data and perform better across tasks [16][17]. Group 6: Hardware Innovations and AI Performance - Innovations in hardware, such as specialized chips, are driving improvements in AI performance, enabling faster and more efficient model training and execution [18][19]. - The rise of edge computing is enhancing AI's responsiveness and efficiency, particularly in real-time applications [20][21]. Group 7: Transparency and Safety in AI - There is a growing emphasis on improving AI transparency and interpretability, which are crucial for safe deployment and reducing biases [22][23]. - Progress is being made in enhancing the transparency of AI models, with notable improvements in scores reflecting their interpretability [23]. Group 8: Challenges in AI Adoption - Companies face significant challenges in AI transformation, including leadership alignment, cost uncertainty, workforce planning, supply chain management, and the need for greater interpretability [26][27][28]. - Successful AI deployment requires strategic transformation beyond mere technology implementation, focusing on organizational structure and mindset [28][29]. Group 9: Future Directions and Leadership - The article advocates for an iterative deployment approach to AI, encouraging collaboration and gradual adaptation rather than excessive regulation [29]. - Leaders are urged to prioritize human agency in AI development, ensuring that technology serves to enhance human capabilities [30][31].
大模型的 5 月:热闹的 30 天和鸿沟边缘
晚点LatePost· 2024-05-29 14:00
"Mayday" 可直译为 5 月天,它也是国际通用的无线电求助信号。当飞机有坠落危险时,飞行员会对着对讲机大喊 "Mayday"! 这个 5 月,可能是 ChatGPT 发布至今大模型行业最热闹的时候:OpenAI、Google、微软、字节跳动、阿里巴巴等中美两国公司至少举办了 13 场与 大模型相关的发布会,介绍了 10 多款新模型,拿出了一堆新产品。 热闹中的风险与失望是:不少从业者认为技术没有重大进步。 OpenAI 本月新发布的 GPT-4o 处理语言的能力停留在 GPT-4 水平,被期待已久的 GPT-5 仍未登场。 多模态成为顶尖 AI 公司的技术焦点:从 OpenAI、Google 到微软,发布能同时处理语音、图像,甚至理解现实世界的模型。但这些能力支持的产品 和应用都还在 Demo 阶段,没正式发布就引出了侵权、隐私隐患等各种麻烦。 唱衰大模型创业机会的金沙江创投主管合伙人朱啸虎有一个观点:如果语言能力的进化速度变慢,"这波热潮就到头了"。 "没什么令人兴奋的。" 一位在中国大公司带队研发大模型的人士说,一系列发布会让他更相信,开发能力更强的小模型才是未来。 一位 AI 创业者说 GPT-4 ...