CoT)

Search documents
在WAIC耳朵听出茧子的「智能体」,是时候系统学一下了
机器之心· 2025-08-04 07:05
摘自Deep(Learning)Focus 作者: Cameron R. Wolfe 机器之心编译 在今年的世界人工智能大会(WAIC)上,智能体是绝对的主角,从 C 端产品到企业级应用,每家参展的 AI 厂商似乎都要提一下在智能体方向的布局。 这其实揭示了一个重要转变:人们不再把 AI 大模型当成一个单纯的聊天机器人,而是希望它能像人一样主动思考、制定计划、使用各种工具来完成任务,这是接 下来大模型走向应用的重要方向。 看来,对于 AI 从业者来说,是时候系统了解一下「智能体」了。 刚好,我们找到了一篇写得非常全面的博客。博客作者是 Netflix 高级研究科学家、莱斯大学博士 Cameron R. Wolfe 。他从最基础的 LLM 说起,逐步引入工具、 推理、自主规划的能力,深度分析了 AI 智能体的底层逻辑。 博客地址:https://cameronrwolfe.substack.com/p/ai-agents 以下是博客的详细内容。 LLM及其能力 标 准 LLM 的输入输出特征 标准 LLM 的功能如上所示。给定一个文本提示,LLM 生成一个文本响应。从许多方面来看, LLM 的通用性是 其最大的 ...
10% KV Cache实现无损数学推理!这个开源方法解决推理大模型「记忆过载」难题
量子位· 2025-06-16 04:50
R-KV团队 投稿 量子位 | 公众号 QbitAI 推理大模型虽好,但一个简单的算数问题能推理整整三页,还都是重复的"废话",找不到重点…… 链式思考(Chain-of-Thought,CoT)让LLM解题思路清晰可见,却也让推理长度指数级膨胀。 以DeepSeek-R1-Llama-8B为例,一道AIME数学题就能写出 3.2万 个Token:模型权重15.5GB,KV缓存再吃 4.1GB ——显存瞬间见底。 现有KV压缩方法(SnapKV、StreamingLLM、H2O等)主要针对 长输入 设计,可一旦模型在输出端开始"碎碎念",相似句子之间互相打高 分注意力,反而让"按注意力删低分"策略失灵: 造成关键步骤被误删、重复内容却被保留、准确率断崖式下跌等问题。 而R-KV通过以下步骤,在模型解码时实时压缩KV缓存来处理冗余的键/值(KV)标记,仅保留重要且非冗余的标记: 让"长时间推理"不再是奢侈品。 项目详情可见文末链接。 R-KV三步走:冗余识别+重要性评估+动态淘汰 一种可以把大模型的"碎碎念"转化为可控记忆条目的高效压缩方法,出现了! R-KV开源登场: 显存↓90%、吞吐×6.6、准确率=10 ...
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].