Workflow
机器之心
icon
Search documents
长上下文窗口、Agent崛起,RAG已死?
机器之心· 2025-10-19 09:17
Core Viewpoint - The article discusses the evolving landscape of Retrieval-Augmented Generation (RAG) and its potential obsolescence due to advancements in context engineering and agent capabilities, suggesting that RAG is not dead but rather transforming into a more sophisticated retrieval paradigm [2][5][21]. Group 1: RAG's Evolution and Current Status - RAG has become a standard solution for addressing the limitations of LLM input lengths, acting as an external knowledge base since 2022 [3][4]. - The emergence of long context windows and agent capabilities is challenging RAG's traditional role, leading to debates about its relevance [5][6]. - RAG is evolving into "agentic retrieval," where AI agents play a central role in advanced retrieval systems, moving beyond basic block retrieval [8][21]. Group 2: Stages of RAG Development - The first stage of RAG involves basic "Top-k" retrieval, where documents are split into chunks, and the most relevant chunks are retrieved based on user queries [10][11]. - The second stage introduces lightweight agents for automatic routing, allowing the system to intelligently select the appropriate retrieval method based on user queries [15]. - The third stage expands to composite retrieval APIs, enabling the system to handle multiple document formats efficiently [17][19]. Group 3: RAG's Future and Integration with Agents - The ultimate goal is to create a fully agent-driven knowledge system that can make intelligent decisions at every stage of the retrieval process [18][21]. - RAG is being redefined as a powerful component within an agent toolbox, rather than the default architecture for all applications [54]. - The future landscape will likely see a combination of various technologies tailored to specific application scenarios, emphasizing the importance of understanding the strengths and weaknesses of each paradigm [52][54].
Meta用40万个GPU小时做了一个实验,只为弄清强化学习Scaling Law
机器之心· 2025-10-19 09:17
Core Insights - The article discusses the advancements in Reinforcement Learning (RL) scaling, emphasizing the need for a systematic approach to understand how to effectively scale RL algorithms and their computational requirements [2][3][4]. Group 1: Research Background - Recent progress in RL has largely stemmed from isolated studies on specific algorithms or models, lacking a comprehensive scaling theory that limits broader research participation [3]. - The study aims to establish a scientific foundation for RL scaling by borrowing concepts from the well-developed "Scaling Law" in pre-training [3][4]. Group 2: Proposed Framework - A predictive framework is introduced to characterize the relationship between RL performance and computational power, using a sigmoid-like saturation curve to link expected rewards with training compute [5][7]. - The framework allows researchers to extrapolate performance at larger scales based on smaller experiments, facilitating the evaluation of RL methods' scalability without exhausting computational budgets [7]. Group 3: ScaleRL Development - ScaleRL is designed based on a systematic empirical study covering over 400,000 GPU hours, exploring various design choices on an 8B parameter model [8]. - Three key principles were identified: performance ceilings vary by method, methods that perform well at small scales may underperform at larger scales, and many techniques thought to enhance peak performance primarily affect computational efficiency [10][11]. Group 4: Algorithmic Choices - ScaleRL integrates existing methods rather than introducing new algorithms, combining asynchronous Pipeline-RL structures, length interruption mechanisms, and various loss functions to achieve predictable scaling [11][36]. - The study validates the effectiveness of these design choices through leave-one-out experiments, demonstrating that ScaleRL consistently outperforms existing RL configurations in both performance and efficiency [38]. Group 5: Predictive Performance Insights - The research investigates which scaling dimensions—context length, batch size, generation count per prompt, or model size—yield the most reliable performance improvements under fixed or growing computational budgets [39]. - Results indicate that larger batch sizes stabilize performance ceilings and avoid premature stagnation, while increasing generation lengths can enhance performance ceilings [42][47]. Group 6: Conclusion and Recommendations - The findings establish a rigorous, quantifiable methodology for predicting the scalability of new RL algorithms, making it a significant contribution to the field of RL in large language models [11][50].
OpenAI「解决」10道数学难题?哈萨比斯直呼「尴尬」,LeCun辛辣点评
机器之心· 2025-10-19 03:48
Core Viewpoint - The article discusses the controversy surrounding OpenAI's claims about GPT-5's capabilities in solving mathematical problems, which were later revealed to be exaggerated and based on existing literature rather than original solutions [1][14][17]. Group 1: Events Leading to Controversy - OpenAI researcher Sebastien Bubeck tweeted that GPT-5 had "solved" Erdős Problem 339, which was incorrectly listed as unsolved in the official database [4][5]. - Following this, other OpenAI researchers claimed to have discovered solutions to 10 problems and made progress on 11 others, leading to widespread media excitement about GPT-5's mathematical reasoning abilities [8][14]. - The initial excitement was quickly countered by criticism from Google DeepMind's CEO Demis Hassabis, who pointed out the misinterpretation of the results [16][17]. Group 2: Clarifications and Apologies - Thomas Bloom, the maintainer of the problem database, clarified that the problems were marked as unsolved due to a lack of awareness of existing solutions, not because they were unsolved [17]. - Bubeck later deleted his tweet and apologized for any misunderstanding, emphasizing the value of AI in literature search rather than in solving complex mathematical problems [18][19]. Group 3: Broader Implications and Perspectives - The incident highlights the tension between the need for scientific rigor and the pressure for hype in the AI community, especially regarding funding and public perception [38][39]. - Terence Tao suggested that AI's most productive applications in mathematics may lie in accelerating mundane tasks like literature reviews rather than solving the most challenging problems [33][36].
一个运行了80年的算法,我们现在才真正理解它?
机器之心· 2025-10-19 03:48
来自 Quanta Magazine 作者: Steve Nadis 机器之心编译 从你网购的包裹如何以最快速度送达,到航空公司如何规划数千架飞机的航线以节省燃料,背后都有一个近 80 岁「高龄」的数学方法在默默 工作。它被誉为优化领域的基石,高效又令人信赖。然而,一个奇怪的事实是:几十年来,没有人能从理论上完美解释它为何如此高效。现 在,这个谜题的最后一块拼图,终于被找到了。 1939 年,当时还是加州大学伯克利分校一年级研究生的 乔治·丹齐格(George Dantzig)在一次统计学课上迟到了。他从黑板上抄下了两个问题,以为是家庭作 业。他后来回忆说,他发现这次的作业「比平时难得多」,并为自己多花了好几天才完成而向教授道歉。 几周后,他的教授告诉他,他成功解决了统计学领域两个尚待解决的 著名 问题。 丹齐格 的这项成果为他的博士论文奠定了基础,并在几十年后成为了电影《心灵捕手》的灵感来源。 乔治 · 丹齐格( George Dantzig , 1914—2005 ),美国著名数学家, 1947 年提出了单纯形法,被称为线性规划之父。 丹齐格 在 1946 年,也就是二战刚结束后不久,获得了博士学位,并很 ...
ACMMM 2025 | 北大团队提出 InteractMove:3D场景中人与可移动物体交互动作生成新框架
机器之心· 2025-10-19 03:48
该论文的第一作者和通讯作者均来自北京大学王选计算机研究所,第一作者为博士生蔡鑫豪,通讯作者为博士生导师刘洋。团队近年来在 TPAMI、IJCV、 CVPR、ICML 等顶会上有多项代表性成果发表,多次荣获国内外多模态理解与生成竞赛冠军,和国内外知名高校、科研机构广泛开展合作。 本文主要介绍来自该团队的最新论文 InteractMove:Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects。 该研究首次提出了含可移动物体的 3D 场景中,基于文本的人 - 物交互生成任务,并构建了大规模数据集与创新方法框架,在多个评测指标上均取得了领先效果。 现有的人 - 场景交互数据集存在交互类别不足的问题,并且通常只考虑与静态物体的交互。随着可移动物体的引入,任务变得更具挑战性:模型不仅需要准确识别 目标交互物体,还要学会与不同类别和尺寸的物体交互,并避免物体与场景之间的碰撞。 为应对这些挑战,该研究提出了一个全新的方法框架:首先利用三维视觉定位模型确定目标交互物体;然后提出手 - 物联合可达图学习,用于预测 ...
「注意力经济」下,AI 生活助手能否解锁生服「新」刚需?
机器之心· 2025-10-19 01:30
Group 1 - The article discusses the potential of AI life assistants in the context of the "attention economy," questioning whether they can unlock new consumer needs amidst challenges like TC-PMF [5][6] - Major domestic internet companies are increasingly investing in the AI life assistant sector, targeting a broader consumer market [6][7] - Tencent's AI assistant "Yuanbao" integrates with WeChat, offering features like article parsing and interactive engagement, but lacks complex functionalities [7][8] - Alibaba is developing AI assistants tailored to its e-commerce needs, with products like "AI Help Me Choose" and "AI Universal Search" aimed at enhancing user experience [8][9] - Meituan's AI assistant "Xiao Mei" focuses on local services, emphasizing its ability to understand user needs and complete service transactions [9][10] - JD.com has introduced several AI products aimed at personal users, including "Jingxi," which aims to integrate AI throughout the shopping process [10][11] - Didi has launched an AI travel assistant "Xiao Di," allowing users to customize their ride requests through natural language [12][13] Group 2 - Data from QuestMobile indicates a significant gap between average monthly usage time for AI applications (132.8 minutes) and overall internet usage (171.7 hours), highlighting growth opportunities for AI life assistants [13][14] - Analysts suggest that as information overload becomes common, AI life assistants can serve as proactive tools for information filtering and task execution, potentially reducing decision-making time for users [14]
Self-Forcing++:让自回归视频生成模型突破 4 分钟时长极限
机器之心· 2025-10-18 08:30
本工作由加州大学洛杉矶分校与字节 Seed 等团队联合完成。 在扩散模型持续引领视觉生成浪潮的今天,图像生成早已臻于极致,但视频生成仍被一个关键瓶颈困住—— 时长限制 。目前多数模型还停留在数秒短视频的生 成, Self-Forcing++ 让视频生成首次跨入 4 分钟高质量长视频时代 ,且无需任何长视频数据再训练。先展示一段 100 秒的生成视频: 论文标题: Self-Forcing++: Towards Minute-Scale High-Quality Video Generation 论文地址: https://arxiv.org/abs/2510.02283 研究背景: 项目主页: https://self-forcing-plus-plus.github.io 代码: https://github.com/justincui03/Self-Forcing-Plus-Plus 为什么长视频生成如此困难? 在扩散模型驱动的视觉生成领域,从 Sora、Wan、Hunyuan-Video 到 Veo,视频模型正不断逼近真实世界。然而几乎所有主流模型都有一个共同限制:只能生成数 秒的短片段。 这背后的原因 ...
那些让你笑cry的动物视频,其实都是AI演的
机器之心· 2025-10-18 08:30
Core Viewpoint - The article discusses the rise of AI-generated videos that deceive viewers, highlighting the potential for misinformation and emotional manipulation through realistic AI content [24]. Group 1: AI-generated Videos - Recent AI-generated videos feature animals in humorous scenarios, such as a panda on a swing and a raccoon interacting with Halloween decorations, which have gained significant attention online [6][9]. - The creation of these videos relies on sophisticated prompt engineering, allowing for highly realistic and engaging content that can easily mislead viewers [11][12]. - Some videos have achieved high view counts, with one Halloween-themed video reaching up to 1.1 million views on YouTube, indicating strong audience engagement [12]. Group 2: Emotional Manipulation - A viral incident involving an AI-generated cat named Pound Cake led to emotional responses from viewers who believed the cat was real, showcasing the potential for AI to create false narratives that resonate with audiences [14][19]. - The revelation that Pound Cake was not a real cat but an AI creation caused significant backlash among followers, highlighting the ethical implications of using AI to fabricate emotional stories [19][21]. Group 3: Ethical Concerns - The article emphasizes the ethical dilemmas posed by AI technology, particularly regarding the authenticity of information and the potential for AI to create misleading content that affects public perception [24]. - There is a growing concern that the proliferation of AI-generated content could lead to a general distrust in media and communications, as individuals may struggle to discern real from fabricated information [24].
稳定训练、数据高效,清华大学提出「流策略」强化学习新方法SAC Flow
机器之心· 2025-10-18 05:44
Core Insights - The article introduces a new scheme for training flow-based policies using a high data efficiency reinforcement learning algorithm called SAC, which optimizes real flow policies end-to-end without the need for surrogate objectives or policy distillation [2][10]. Group 1: Research Background - Flow-based policies have gained popularity in the field of robotic learning due to their ability to model multi-modal action distributions and their simplicity compared to diffusion policies, leading to their widespread application in advanced VLA models [4]. - Previous attempts to train flow policies using on-policy RL algorithms have faced challenges, particularly when using data-efficient off-policy RL methods like SAC, which often result in instability due to gradient explosion during multi-step sampling [4][5]. Group 2: Methodology - The proposed approach views the training of flow policies as equivalent to training a recurrent neural network (RNN), allowing the use of modern recurrent structures like GRU and Transformer to stabilize training [7][11]. - SAC Flow incorporates Gaussian noise and drift correction in each rollout to ensure the end action distribution remains unchanged, allowing the actor/critic loss of SAC to be expressed using the log-likelihood of multi-step sampling from the flow policy [15]. Group 3: Training Paradigms - Two training paradigms are supported: - From-scratch training for dense-reward tasks, where SAC Flow can be trained directly [16]. - Offline-to-online training for sparse-reward tasks, where pre-training on a dataset is followed by online fine-tuning [19]. Group 4: Experimental Results - In experiments, both Flow-G and Flow-T achieved state-of-the-art performance in the Mujoco environment, demonstrating stability and high sample efficiency [22][24]. - The results indicate that SAC Flow is robust to the number of sampling steps (K), maintaining stable training across various K values, with Flow-T showing particularly strong robustness [30]. Group 5: Comparison with Similar Works - Unlike FQL/QC-FQL, which distill flow policies into single-step models before off-policy RL training, SAC Flow retains the modeling capabilities of flow policies without distillation [33]. - SAC Flow-T and Flow-G exhibited faster convergence and higher final returns across various environments compared to diffusion policy baselines and other flow-based methods [34][35]. Group 6: Conclusion - The key attributes of SAC Flow are serialization, stable training, and data efficiency, leveraging the experience of GRU and Transformer structures to stabilize gradient backpropagation [37].
Andrej Karpathy 开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
机器之心· 2025-10-18 05:44
Core Viewpoint - AI is projected to contribute an annual GDP increase of 2%, but the current state of the industry is criticized for being overly optimistic and disconnected from reality [2][5]. Group 1: AGI and Learning - AGI is expected to take about ten years to develop, as current AI agents lack the necessary cognitive abilities and continuous learning capabilities [9][11]. - Current AI models, particularly large language models (LLMs), exhibit cognitive deficiencies that hinder their performance [34][36]. - The concept of reinforcement learning is deemed inadequate for replicating human learning processes, as it oversimplifies the complexity of human decision-making [44][46]. Group 2: AI Development and Challenges - The industry is experiencing a phase of rapid development, but there is skepticism about the actual capabilities of AI models, which are often overhyped [5][41]. - Current AI agents struggle with understanding and integrating unique coding implementations, leading to inefficiencies and misunderstandings in code generation [36][41]. - The reliance on pre-trained models and the limitations of current AI tools highlight the need for further advancements in AI technology [20][42]. Group 3: Future of AI - The future of AI is expected to involve more sophisticated attention mechanisms and potentially a shift towards more efficient learning algorithms [29][30]. - There is a belief that while AI will continue to evolve, it will still rely on foundational principles such as gradient descent for training large neural networks [29][30]. - The ongoing improvements in AI tools and models suggest a continuous integration of new techniques and methodologies to enhance performance [42][43].