Workflow
机器之心
icon
Search documents
AI记忆伪装被戳穿!GPT、DeepSeek等17款主流大模型根本记不住数字
机器之心· 2025-06-15 04:40
Core Viewpoint - The article discusses a study that reveals large language models (LLMs) do not possess human-like working memory, which is essential for coherent reasoning and conversation [5][30]. Summary by Sections Working Memory - Working memory in humans retains information for a short period, enabling reasoning and complex tasks [7]. - LLMs are often compared to a "talking brain," but the lack of working memory is a significant barrier to achieving true general artificial intelligence [8]. Evaluation of Working Memory - Traditional N-Back Task assessments are unsuitable for LLMs, as they can access all historical tokens rather than recalling internal memory [10]. Experiments Conducted - **Experiment 1: Number Guessing Game** - LLMs were asked to think of a number between 1-10 and respond to repeated guesses. Most models failed to provide a "yes" response, indicating a lack of internal memory [13][19]. - **Experiment 2: Yes-No Game** - LLMs were tasked with answering questions about a chosen object. Results showed that models began to contradict themselves after 20-40 questions, demonstrating inadequate working memory [22][26]. - **Experiment 3: Math Magic** - LLMs were required to remember and manipulate numbers through a series of calculations. The accuracy was low across models, with LLaMA-3.1-8B performing best [28][29]. Conclusions - None of the tested models passed all three experiments, indicating a significant gap in their ability to mimic human-like working memory [30]. - Future advancements in AI may require integrating a true working memory mechanism rather than relying solely on extended context windows [30].
通用 Agent 之外,Agentic Age 流量赛还有哪些「隐藏副本」?
机器之心· 2025-06-14 12:45
1. 通用 Agent 之外,Agentic Age 流量赛还有哪些「隐藏副本」? Agentic AI 的「流量入口」逻辑,与传统互联网时代有何根本不同?有哪些产品被视为当前最值得争夺的战略高地,而又是谁 在主导这些战略入口?在「流量入口即生态」的新范式下,各主力玩家如何划定阵地?有哪些路线分歧? 2. 烧钱一年,李飞飞的「空间智能」愿景有变化吗? 机器之心PRO · 会员通讯 Week 24 --- 本周为您解读 ② 个值得细品的 AI & Robotics 业内要事 --- ① AI 助手可以跨平台自主执行任务,绕过传统平台的注意力分发模式。过去的互联网时代,用户获取信息和服务 的入口主要集中在搜索引擎、社交平台、门户网站等传统节点。用户主动搜索或点击链接,即可获得所需内容。 World Labs 的愿景有变化吗?AI 技术如何「反直觉」发展?为什么没有空间智能的 AI 是不完整的?空间智能如何解锁从「单 一现实」到「多元宇宙」的未来?为什么李飞飞没有更早重视 3D 表征? ... 本期完整版通讯含 2 项专题解读 + 31 项 AI & Robotics 赛道要事速递,其中技术方面 12 项,国内方面 ...
首个统一的图像与视频AIGC可解释性检测框架,多榜单SOTA性能
机器之心· 2025-06-14 12:45
想象一下:你正在浏览社交媒体,看到一张震撼的图片或一段令人震撼的视频。它栩栩如生,细节丰富,让你不禁信以为真。但它究竟是真实记录,还是由顶尖 AI 精心炮制的「杰作」?如果一个 AI 工具告诉你这是「假的」,它能进一步解释理由吗?它能清晰指出图像中不合常理的光影,或是视频里一闪而过的时序破 绽吗? 这种「真假难辨」且「知其然不知其所以然」的困境,正是当前 AIGC 时代我们面临的严峻挑战。随着 AI 生成的内容越来越逼真 ,传统的「黑箱式」检测工具 已难以满足我们对透明度和可信度的需求 。我们迫切需要能够同时处理图像和视频、并且能给出「诊断报告」的智能检测系统。正因如此,这篇论文提出了 「IVY-FAKE:一个统一的可解释性图像与视频 AIGC 检测框架与基准」 ,目标是让 AI 不仅能识别「李逵」与「李鬼」,更能清楚解释:是哪些具体的视觉伪影 (空间或时间上的),暴露了内容的「AI 基因」。 该工作由 π 3 AI Lab, 武汉大学,南京大学,斯坦福大学机构的多位研究人员合作完成。 论文标题:IVY-FAKE: A Unified Explainable Framework and Benchmark f ...
多智能体在「燃烧」Token!Anthropic公开发现的一切
机器之心· 2025-06-14 04:12
Core Insights - Anthropic's new research on multi-agent systems highlights the advantages of using multiple AI agents for complex research tasks, emphasizing their ability to adapt and explore dynamically [2][3][6][7]. Multi-Agent System Advantages - Multi-agent systems excel in research tasks that require flexibility and the ability to adjust methods based on ongoing discoveries, as they can operate independently and explore various aspects of a problem simultaneously [7][8]. - Anthropic's internal evaluations show that their multi-agent system outperforms single-agent systems by 90.2% in breadth-first query tasks [8]. - The architecture allows for efficient token consumption, with multi-agent systems demonstrating a significant performance boost compared to single-agent models [9][10]. System Architecture - The multi-agent architecture follows a "coordinator-worker" model, where a lead agent coordinates tasks among several specialized sub-agents [14][18]. - The lead agent analyzes user queries, creates sub-agents, and oversees their independent exploration of different aspects of the query [19][21]. Performance Evaluation - Traditional evaluation methods are inadequate for multi-agent systems due to their non-linear and varied paths to achieving results; flexible evaluation methods are necessary [44][45]. - Anthropic employs a "LLM-as-judge" approach for evaluating outputs, which enhances scalability and practicality in assessing the performance of multi-agent systems [49][53]. Engineering Challenges - The complexity of maintaining state in intelligent agent systems poses significant engineering challenges, as minor changes can lead to substantial behavioral shifts [56][61]. - Anthropic has implemented robust debugging and tracking mechanisms to diagnose and address failures in real-time [57]. Conclusion - Despite the challenges, multi-agent systems have shown immense potential in open-ended research tasks, provided they are designed with careful engineering, thorough testing, and a deep understanding of current AI capabilities [61].
LLM已能自我更新权重,自适应、知识整合能力大幅提升,AI醒了?
机器之心· 2025-06-14 04:12
Core Insights - The article discusses the increasing research and discussions around AI self-evolution, highlighting various frameworks and models that aim to enable AI systems to improve themselves autonomously [1][2]. Group 1: AI Self-Evolution Frameworks - Several notable frameworks for AI self-improvement are mentioned, including "Darwin-Gödel Machine" (DGM), "Self-Reinforcement Training" (SRT), "MM-UPT" for multimodal large models, and "UI-Genie" for self-improvement [1]. - OpenAI's CEO Sam Altman envisions a future where humanoid robots can autonomously manufacture more robots and essential infrastructure, indicating a significant leap in AI capabilities [1]. - A recent MIT paper titled "Self-Adapting Language Models" introduces SEAL (Self-Adapting LLMs), which allows language models to update their weights based on generated training data [2][4]. Group 2: SEAL Methodology - SEAL employs a self-editing mechanism through reinforcement learning, where the model generates its own training data and updates its weights based on performance improvements [10][12]. - The SEAL framework consists of two nested loops: an external reinforcement learning loop for optimizing self-editing generation and an internal update loop for adjusting model parameters [13][15]. - The model's training involves generating self-edits and using supervised fine-tuning to update its parameters, enhancing its adaptability to new tasks [18][19]. Group 3: Experimental Results - In few-shot learning experiments, SEAL achieved a success rate of 72.5%, significantly outperforming baseline methods, which had success rates of 0% and 20% [34][36]. - For knowledge integration tasks, SEAL demonstrated improved accuracy, achieving 47.0% in single passage scenarios and 43.8% in continued pretraining, surpassing other training methods [38][40]. - The results indicate that SEAL's reinforcement learning approach leads to more effective self-edits, enhancing overall model performance [43].
单卡4090也能高质量视频编辑!西湖AGI Lab无训练框架FlowDirector来了
机器之心· 2025-06-14 04:12
第一作者是来自中南大学软件工程的本科生李光照,通讯作者为来自西湖大学 AGI 实验室的助理教授张驰。本文工作是李光照在西湖大学 AGI 实验室访问时完 成。 视频的生成与编辑往往有着较高的门槛,新手往往会被视频工作中各种复杂的工作流劝退。随着人工智能技术的发展,AIGC 视频编辑简化了这种复杂的工作流 程,只需在输入框里敲下一句自然语言,就能让原视频在几分钟内蜕变成全新画面。然而,当前的视频编辑方法通常采用非常复杂的策略来维持编辑前后无关的 事物保持一致,这带来了很多不必要的开销,尤其是计算资源的消耗,且仍会对无关区域造成严重的干扰,同时也会抑制主体对象的编辑效果,使得产生用户难 以接受的效果。 为解决上述困境,西湖大学 AGI Lab 团队提出了 FlowDirector:一种全新的无需训练的视频编辑框架 。FlowDirector 在视频 "流匹配"(Flow Matching)范式下进 行,可以将任意基于流的视频生成模型改造成有效的视频编辑工具,而无需任何的重新训练。相较于其他视频编辑方法,FlowDirector: 1. 质量更高:FlowDirector 可以进行更加彻底的对象编辑,允许产生大幅度形 ...
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].
ICML 2025 | 千倍长度泛化!蚂蚁新注意力机制GCA实现16M长上下文精准理解
机器之心· 2025-06-13 15:45
Core Viewpoint - The article discusses the challenges of long text modeling in large language models (LLMs) and introduces a new attention mechanism called Grouped Cross Attention (GCA) that enhances the ability to process long contexts efficiently, potentially paving the way for advancements in artificial general intelligence (AGI) [1][2]. Long Text Processing Challenges and Existing Solutions - Long text modeling remains challenging due to the quadratic complexity of the Transformer architecture and the limited extrapolation capabilities of full-attention mechanisms [1][6]. - Existing solutions, such as sliding window attention, sacrifice long-range information retrieval for continuous generation, while other methods have limited generalization capabilities [7][8]. GCA Mechanism - GCA is a novel attention mechanism that learns to retrieve and select relevant past segments of text, significantly reducing memory overhead during long text processing [2][9]. - The mechanism operates in two stages: first, it performs attention on each chunk separately, and then it fuses the information from these chunks to predict the next token [14][15]. Experimental Results - Models incorporating GCA demonstrated superior performance on long text datasets, achieving over 1000 times length generalization and 100% accuracy in 16M long context retrieval tasks [5][17]. - The GCA model's training costs scale linearly with sequence length, and its inference memory overhead approaches a constant, maintaining efficient processing speeds [20][21]. Conclusion - The introduction of GCA represents a significant advancement in the field of long-context language modeling, with the potential to facilitate the development of intelligent agents with permanent memory capabilities [23].
烧钱一年,李飞飞的「空间智能」愿景有变化吗?
机器之心· 2025-06-13 12:02
Group 1 - The core vision of World Labs, founded by Fei-Fei Li, emphasizes the importance of spatial intelligence and world models in AI development, aiming to create AI systems that can understand and generate 3D physical worlds [5][6][7] - World Labs has achieved significant milestones in its first year, including raising $230 million in funding and reaching a valuation of over $1 billion, positioning itself as a notable player in the AI sector [5][6] - The company has released technologies such as the "world generation" model and the Forge renderer, which facilitate the creation of interactive 3D environments from single images [6][7] Group 2 - Fei-Fei Li argues that current language models (LLMs) have limitations in describing and understanding 3D physical worlds, making spatial intelligence a crucial component for AI [5][6] - The success of LLMs has provided methodologies for spatial intelligence, but true breakthroughs require interdisciplinary integration, particularly between AI and computer graphics [7][8] - The advancements in computational power, data availability, and engineering capabilities have made the pursuit of "world models" a realistic goal [7]
一粒「扣子」,开启了Agent的全生命周期进化
机器之心· 2025-06-13 09:22
Core Viewpoint - The year 2025 is anticipated to be a breakthrough year for Agents, significantly enhancing the capabilities of large models and transforming human-computer interaction across various platforms, particularly in multi-task automation [1]. Group 1: Agent Development and Platforms - The emergence of the first general-purpose Agent product, Manus, has garnered unprecedented attention, with major internet companies and startups focusing on Agents as a key area of AI competition [2]. - At the recent Force 2025 conference, Agents were highlighted alongside the latest version of the Doubao large model series [3]. - The conference's main forum showcased a new paradigm for AI cloud-native Agent development, emphasizing how Agents can reshape productivity [4]. - The Doubao platform has evolved into a "full lifecycle platform," addressing diverse development and tuning needs for Agents in the era of large models [5]. Group 2: Doubao Platform Features - The Doubao development platform enables low-code Agent development, allowing users with no coding experience to quickly build and deploy Agents across various channels [8]. - The platform empowers Agent development through four main aspects: intelligent IDE, application IDE, a rich set of plugins and workflow templates, and enterprise-level security capabilities [9]. - The application IDE, set to launch in 2024, will allow developers to create GUI-based applications using drag-and-drop features [10]. - Pre-configured Agent templates facilitate rapid deployment of functional Agents, such as smart customer service assistants and educational assistants [12]. Group 3: Eino Framework - Eino, a Go language-based LLM application development framework, draws inspiration from open-source communities and emphasizes simplicity, scalability, reliability, and effectiveness [13]. - Eino standardizes core modules for Agent development, enabling seamless integration with both open-source and closed-source models [14]. - The framework supports flexible orchestration capabilities for complex task decomposition and multi-tool collaboration [15]. - Over 300 systems have been developed internally at ByteDance using Eino, with a GitHub star count of 4.3k, indicating growing interest among developers [16]. Group 4: Agent Lifecycle Management - The Doubao platform establishes a comprehensive Agent lifecycle system encompassing development, evaluation, online observation, and optimization [16]. - The evaluation phase includes quantifying Agent performance to ensure it meets standards, while the observation phase involves real-time data collection and analysis [19]. - Developers can analyze user queries and behavior to adjust Agent performance, identifying and addressing issues through a robust observation system [20]. - The platform supports flexible evaluation set management, allowing developers to create and manage evaluation sets easily [22]. Group 5: Doubao Space - Doubao Space, launched in April, serves as a collaborative platform for high-quality Agents, facilitating efficient task resolution through expert collaboration [25]. - Users can leverage Doubao Space for market analysis, academic guidance, and expert support, with capabilities continuously expanded through the MCP protocol [26]. - The Doubao platform is expected to become foundational infrastructure for Agent development in the era of large models [27].