Workflow
Gemini 1.5 Pro
icon
Search documents
李飞飞一年前究竟说了啥?怎么又火了
量子位· 2025-09-11 01:58
Core Viewpoint - The limitations of large language models (LLMs) in understanding the physical world are highlighted, emphasizing that language is a generated signal dependent on human input, while the physical world is an objective reality governed by its own laws [1][5][19]. Group 1: Language Models and Their Limitations - Language models operate on a one-dimensional representation of discrete tokens, making them adept at handling written text but inadequate for representing the three-dimensional nature of the physical world [12][14]. - The challenge of spatial intelligence lies in extracting, representing, and generating information from the real world, which is fundamentally different from language processing [17][19]. - Experiments show that LLMs struggle with physical tasks, performing poorly compared to human children and specialized robots [22][28]. Group 2: Experimental Findings - In a test using the Animal-AI environment, LLMs could only complete simple tasks, failing at more complex ones even with additional teaching examples [26][27]. - A tool named ABench-Physics was developed to assess LLMs' physical reasoning abilities, revealing that even the best models achieved only a 43% accuracy rate on basic physics problems [30][34]. - Visual tasks further demonstrated the limitations of LLMs, with human accuracy at 95.7% compared to a maximum of 51% for the models [37][41]. Group 3: Philosophical and Future Considerations - The discussion includes perspectives on whether language can sometimes describe reality better than perception and the potential for AI to develop its own language for understanding the physical world [46][47]. - The ongoing development of models based on physical and multimodal understanding indicates a shift towards addressing these limitations [44].
X @Demis Hassabis
Demis Hassabis· 2025-08-14 01:17
RT Google Gemini App (@GeminiApp)We’re introducing a new setting that allows Gemini to learn from your past conversations over time.When this setting is on, Gemini remembers key details and preferences you've shared, leading to more natural and relevant conversations, as if you're collaborating with a partner who's already up to speed. Rolling out to 2.5 Pro users today and will expand to 2.5 Flash soon. ...
a16z合伙人:AI正将10倍工程师“降级”为2倍,应用层已无技术护城河,未来在基础设施和业务深耕
3 6 Ke· 2025-08-13 08:32
Group 1 - The current AI landscape is reminiscent of the past cloud computing wars, with a few major players likely to dominate the market, leading to a "winner-takes-all" scenario [3][11] - Venture capitalists are investing in AI not for immediate profits but to secure future market access, often sacrificing short-term gains for long-term distribution advantages [3][27] - AI is transforming the nature of work for developers, allowing them to focus more on creativity rather than mundane tasks, although productivity gains may not translate into faster output [3][39] Group 2 - The investment landscape in AI is characterized by a mix of excitement and uncertainty, with historical patterns suggesting that many business models previously deemed unviable eventually succeed [6][7] - The emergence of new AI models, such as Claude 4, creates a dynamic environment where market leaders may not maintain their dominance for long due to the rapid development of competing technologies [9][10] - The potential for either monopoly or oligopoly in the AI model market is significant, with historical precedents from cloud services indicating that major players can subsidize their operations to capture market share [11][12] Group 3 - The AI market is currently experiencing rapid growth, with leading brands benefiting from brand recognition and distribution advantages, which may not last once market growth slows [17][18] - The distinction between application development and infrastructure is crucial, as AI tools can enhance development processes but do not fundamentally change the market dynamics of core infrastructure [42][43] - The future of AI models may lead to fragmentation, with various companies carving out niches in specific markets, despite the overarching trend towards consolidation among major players [25][26] Group 4 - The discussion around AI safety and regulation is evolving, with a notable shift towards supporting open-source initiatives, despite concerns about security [29][32] - The rapid advancement of code models has surprised many, indicating a significant shift in how programming is approached, making it more enjoyable and efficient for developers [34][36] - The societal implications of AI, particularly regarding job displacement, are complex, with many roles transforming rather than disappearing, necessitating a thoughtful approach to workforce adaptation [48][49]
“我没错”GPT-4o嘴硬翻车,AI在黑天鹅事件面前集体宕机
3 6 Ke· 2025-07-16 11:19
Core Insights - A joint research team from Columbia University, Vector AI Research Institute, and Nanyang Technological University found significant deficiencies in AI models' reasoning capabilities when handling unexpected events, with top models like GPT-4o and Gemini 1.5 Pro performing up to 32% worse than humans [2][14][15] Group 1: Research Findings - The study titled "Black Swan" highlights a fundamental issue in current AI evaluation methods, which primarily focus on predictable and clear visual scenarios, neglecting the unpredictable nature of real-world events [4][14] - Two core reasoning abilities essential for humans to handle unexpected situations are identified: abductive reasoning (inferring the most likely explanation from limited observations) and defeasible reasoning (revising initial conclusions based on new evidence) [5][14] Group 2: New Benchmark Testing - To accurately assess AI's reasoning capabilities in unexpected situations, the research team developed a new benchmark test called "BlackSwanSuite," consisting of 1,655 videos depicting various unconventional real-life scenarios [8][11] - The benchmark includes three core tasks: "Forecaster," where models predict future events from the beginning of a video; "Detective," where models infer missing information from the start and end of a video; and "Reporter," where models describe the entire event and reassess previous judgments based on complete information [11][12] Group 3: Performance Comparison - Top AI models, including GPT-4o and Gemini 1.5 Pro, significantly lag behind humans across all three tasks, with the best models trailing humans by up to 25% in multiple-choice questions and 32% in true/false judgments [14][15] - In the "Detective" task, GPT-4o's accuracy was 24.9% lower than that of humans, while in the "Reporter" task, the gap reached 32%, indicating that AI struggles to correct initial misjudgments [14][15][16] Group 4: Implications of Findings - The research indicates that AI models often "lock in" their initial judgments and fail to update their reasoning based on new evidence, which poses significant risks in critical applications like autonomous driving [17][20] - An experiment showed that when AI models were provided with human-written descriptions of video content, their reasoning accuracy improved by up to 10%, suggesting that the core shortcoming lies not only in advanced reasoning but also in basic perception and understanding capabilities [25][26]
过度炒作+虚假包装?Gartner预测2027年超40%的代理型AI项目将失败
3 6 Ke· 2025-07-04 10:47
Core Insights - The emergence of "Agentic AI" is gaining attention in the tech industry, with predictions that 2025 will be the "Year of AI Agents" [1][9] - Concerns have been raised about the actual capabilities and applicability of Agentic AI, with many projects potentially falling into the trap of concept capitalization rather than delivering real value [1][2] Group 1: Current State of Agentic AI - Gartner predicts that by the end of 2027, over 40% of Agentic AI projects will be canceled due to rising costs, unclear business value, or insufficient risk control [1][10] - A survey by Gartner revealed that 19% of organizations have made significant investments in Agentic AI, while 42% have made conservative investments, and 31% are uncertain or waiting [2] Group 2: Misrepresentation and Challenges - There is a trend of "agent washing," where existing AI tools are rebranded as Agentic AI without providing true agent capabilities; only about 130 out of thousands of vendors actually offer genuine agent functions [2][3] - Most current Agentic AI solutions lack clear business value or return on investment (ROI), as they are not mature enough to achieve complex business goals [3][4] Group 3: Performance Evaluation - Research from Carnegie Mellon University indicates that AI agents have significant gaps in their ability to replace human workers in real-world tasks, with the best-performing model, Gemini 2.5 Pro, achieving only a 30.3% success rate in task completion [6][7] - In a separate evaluation for customer relationship management (CRM) scenarios, leading models showed limited performance, with single-turn interactions averaging a 58% success rate, dropping to around 35% in multi-turn interactions [8] Group 4: Industry Reactions and Future Outlook - Companies like Klarna have experienced setbacks with AI tools, leading to a return to human employees for customer service due to quality issues [9] - Despite current challenges, Gartner remains optimistic about the long-term potential of Agentic AI, forecasting that by 2028, at least 15% of daily work decisions will be made by AI agents [10]
斯坦福临床医疗AI横评,DeepSeek把谷歌OpenAI都秒了
量子位· 2025-06-03 06:21
Core Insights - The article discusses the comprehensive evaluation of large language models (LLMs) for medical tasks, highlighting that DeepSeek R1 achieved a 66% win rate, outperforming other models in a clinical context [1][7][24]. Evaluation Framework - A comprehensive assessment framework named MedHELM was developed, consisting of 35 benchmark tests covering 22 subcategories of medical tasks [12][20]. - The classification system was validated by 29 practicing clinicians from 14 medical specialties, ensuring its relevance to real-world clinical activities [4][17]. Model Performance - DeepSeek R1 led the evaluation with a 66% win rate and a macro average score of 0.75, indicating its superior performance across the benchmark tests [7][24]. - Other notable models included o3-mini with a 64% win rate and Claude 3.7 Sonnet with a 64% win rate, while models like Gemini 1.5 Pro ranked lowest with a 24% win rate [26][27]. Benchmark Testing - The evaluation included 17 existing benchmarks and 13 newly developed tests, with 12 of the new tests based on real electronic health record data [21][20]. - The models showed varying performance across different task categories, with higher scores in clinical case generation and patient communication tasks compared to structured reasoning tasks [32]. Cost-Effectiveness Analysis - A cost analysis was conducted based on the token consumption during the evaluation, revealing that non-reasoning models like GPT-4o mini had lower costs compared to reasoning models like DeepSeek R1 [38][39]. - The analysis indicated that models like Claude 3.5 Sonnet and Claude 3.7 Sonnet provided good value for their performance at lower costs [39].
胡泳:超级能动性——如何将人类潜能提升到新高度
3 6 Ke· 2025-05-28 11:54
Group 1 - The core idea is that AI, like the internet decades ago, is at the beginning of a transformative phase that could significantly enhance human productivity and creativity through human-machine collaboration [2][3][4] - AI is seen as a "super-empowerment" tool that can amplify human capabilities, enabling individuals to achieve unprecedented levels of creativity and productivity [4][5] - The historical context of transformative technologies suggests that while initial reactions may be pessimistic, the long-term impacts can be overwhelmingly positive [3][4] Group 2 - AI is evolving beyond mere task automation to include cognitive functions such as reasoning, planning, and decision-making, which could reshape human interactions with technology [6][8] - Recent advancements in AI, particularly in large language models (LLMs), have shown significant improvements in reasoning capabilities, allowing them to perform well on standardized tests [7][8] - The emergence of agentic AI, which can autonomously take actions and make decisions, represents a significant leap in AI's capabilities, potentially transforming it into a digital workforce [9][10] Group 3 - Multi-modal AI is advancing, integrating various data types (text, audio, video) to enhance understanding and interaction, which could lead to broader applications across industries [11][13] - Hardware innovations, such as specialized chips, are driving AI performance improvements, enabling faster and more efficient processing of complex tasks [14][15] - Transparency and interpretability in AI are becoming increasingly important for safe deployment, with ongoing improvements in model transparency scores [16][17] Group 4 - The potential for AI to drive revenue growth is significant, with nearly 90% of business leaders anticipating positive impacts from AI deployment, although many transformations face challenges [18][19] - Key challenges in AI transformation include leadership alignment, cost uncertainty, workforce planning, supply chain management, and the need for greater interpretability [19][20][21] - Companies are encouraged to adopt a strategic approach to AI, focusing on human agency and iterative deployment to foster innovation and address potential risks [22][24]
胡泳:超级能动性——如何将人类潜能提升到新高度
腾讯研究院· 2025-05-28 08:34
Core Insights - The article emphasizes that AI, like the internet decades ago, is at the beginning of a transformative phase that could redefine human productivity and creativity, leading to a state of "super agency" where humans and machines collaborate effectively [1][4][5]. Group 1: AI's Transformative Potential - AI is seen as a powerful tool that can enhance human capabilities, acting as a "force multiplier" rather than just a tool [4][5]. - The concept of "super agency" describes how individuals can leverage AI to significantly boost their creativity, productivity, and influence [5]. - AI is expected to democratize knowledge acquisition and automate numerous tasks, provided it is developed and deployed safely and equitably [5][7]. Group 2: Historical Context and Public Perception - Historical technological advancements often faced initial skepticism, with concerns about their negative impacts overshadowing their potential benefits [3]. - The narrative around AI is influenced by dystopian themes, yet there is a call to reframe this perspective to envision positive outcomes [3][4]. Group 3: AI's Advancements and Capabilities - AI is evolving to automate cognitive functions, enabling it to adapt, plan, and make decisions autonomously, which could drive unprecedented economic growth and social change [7][8]. - Significant advancements in AI, such as large language models (LLMs), have shown remarkable performance in standardized tests, indicating a leap in reasoning capabilities [8][9]. Group 4: Autonomous AI and Its Implications - Agentic AI is emerging, capable of independent action and complex task execution, marking a shift from passive tools to proactive digital partners [11][12]. - Companies are integrating agentic AI into their core products, enhancing collaboration between humans and automated systems [13]. Group 5: Multi-modal AI Development - Current AI models are advancing towards multi-modal capabilities, processing various data types (text, audio, video) simultaneously, which enhances understanding and interaction [14][15]. - Self-supervised learning techniques are being utilized to improve multi-modal models, allowing them to learn from unlabelled data and perform better across tasks [16][17]. Group 6: Hardware Innovations and AI Performance - Innovations in hardware, such as specialized chips, are driving improvements in AI performance, enabling faster and more efficient model training and execution [18][19]. - The rise of edge computing is enhancing AI's responsiveness and efficiency, particularly in real-time applications [20][21]. Group 7: Transparency and Safety in AI - There is a growing emphasis on improving AI transparency and interpretability, which are crucial for safe deployment and reducing biases [22][23]. - Progress is being made in enhancing the transparency of AI models, with notable improvements in scores reflecting their interpretability [23]. Group 8: Challenges in AI Adoption - Companies face significant challenges in AI transformation, including leadership alignment, cost uncertainty, workforce planning, supply chain management, and the need for greater interpretability [26][27][28]. - Successful AI deployment requires strategic transformation beyond mere technology implementation, focusing on organizational structure and mindset [28][29]. Group 9: Future Directions and Leadership - The article advocates for an iterative deployment approach to AI, encouraging collaboration and gradual adaptation rather than excessive regulation [29]. - Leaders are urged to prioritize human agency in AI development, ensuring that technology serves to enhance human capabilities [30][31].
大模型的 5 月:热闹的 30 天和鸿沟边缘
晚点LatePost· 2024-05-29 14:00
"Mayday" 可直译为 5 月天,它也是国际通用的无线电求助信号。当飞机有坠落危险时,飞行员会对着对讲机大喊 "Mayday"! 这个 5 月,可能是 ChatGPT 发布至今大模型行业最热闹的时候:OpenAI、Google、微软、字节跳动、阿里巴巴等中美两国公司至少举办了 13 场与 大模型相关的发布会,介绍了 10 多款新模型,拿出了一堆新产品。 热闹中的风险与失望是:不少从业者认为技术没有重大进步。 OpenAI 本月新发布的 GPT-4o 处理语言的能力停留在 GPT-4 水平,被期待已久的 GPT-5 仍未登场。 多模态成为顶尖 AI 公司的技术焦点:从 OpenAI、Google 到微软,发布能同时处理语音、图像,甚至理解现实世界的模型。但这些能力支持的产品 和应用都还在 Demo 阶段,没正式发布就引出了侵权、隐私隐患等各种麻烦。 唱衰大模型创业机会的金沙江创投主管合伙人朱啸虎有一个观点:如果语言能力的进化速度变慢,"这波热潮就到头了"。 "没什么令人兴奋的。" 一位在中国大公司带队研发大模型的人士说,一系列发布会让他更相信,开发能力更强的小模型才是未来。 一位 AI 创业者说 GPT-4 ...