Workflow
语言模型
icon
Search documents
微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」
机器之心· 2025-05-06 08:04
近年来,图形用户界面(GUI)自动化技术正在逐步改变人机交互和办公自动化的生态。然而,以 Robotic Process Automation(RPA)为代表的传统自动化工具通 常依赖固定脚本进行操作,存在界面变化敏感、维护成本高昂、用户体验欠佳等明显问题。 同时,近年来兴起的基于大型语言模型(LLM)的计算机智能体(Computer-Using Agents,CUA)虽然展现出灵活的自动化潜力,但多数方案仍停留在概念验证 或原型阶段,缺乏与操作系统深度集成的能力,制约了其在实际工作环境中的规模化应用。 针对这些行业痛点,作为前代纯 GUI 桌面智能体 UFO 的全面升级版, 微软研究团队近日正式开源了业内首个深度集成 Windows 操作系统的桌面智能体平 台 ——UFO² AgentOS 。 该平台不仅继承了 UFO 的强大 GUI 操作能力,还在系统层面进行了深度优化,显著提升了智能体在 Windows 环境下的操作效率与稳定 性。 本论文第一作者为微软 DKI 团队的 Chaoyun Zhang,其为 Windows 平台首个智能体系统 ——UFO 的核心开发者,该项目已在 GitHub 上开源并获得 ...
ICML 2025 | 注意力机制中的极大值:破解大语言模型上下文理解的关键
机器之心· 2025-05-06 04:11
研究亮点 极大值如何影响模型性能 当我们谈论大型语言模型的理解能力时,通常将其知识分为两类:参数知识(存储在模型权重中的事实和信息)和上下文知识(从当前输入文本中获取的信 息)。本研究通过一系列精心设计的实验,揭示了自注意力模块中极大值的存在与上下文知识理解之间的关键联系。 大型语言模型(LLMs)在上下文知识理解方面取得了令人瞩目的成功。 近日,一项来自 ICML 2025 的新研究《Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding》揭示了大型语言模型中一个重要 现象:在注意力机制的查询 (Q) 和键 (K) 表示中存在非常集中的极大值,而在值 (V) 表示中却没有这种模式。这一现象在使用旋转位置编码 (RoPE) 的现代 Transformer 模型中普遍存在,对我们理解 LLM 内部工作机制具有重要意义。 本研究由罗格斯大学张永锋教授的团队完成,一作为金明宇,罗格斯大学博士生,在 ACL、ICML 、AAAI 、NAACL 、COLM 、ICLR 、EMNLP 、COLIN ...
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].
当答案变得廉价时,好问题就是新的稀缺品
3 6 Ke· 2025-05-04 00:03
Group 1 - The core argument of the article is that in an era where answers are easily accessible, the value lies in asking the right questions, which can reshape understanding and drive creativity [1][4][19] - The invention of photography in the 1830s challenged traditional artistic standards, leading artists to focus on subjective experiences rather than mere replication of reality [3][10][11] - The emergence of large language models (LLMs) has made obtaining answers cheaper, but this has led to a decline in the quality of inquiry and an increase in the cost of asking good questions [15][17][26] Group 2 - The article emphasizes that the value of information is proportional to the uncertainty it eliminates, as illustrated by Claude Shannon's information theory [21][22][23] - It argues that in a world of information overload, the challenge is not the lack of facts but the misalignment of attention, leading to a focus on quantity over quality in answers [31][32][46] - The piece highlights the importance of redefining problems and frameworks to navigate structural uncertainties effectively, suggesting that good questions can expand the boundaries of understanding [37][38][39]
315 行代码构建编程助手,Go大佬揭开智能体的「神秘面纱」
机器之心· 2025-05-03 04:18
选自 ampcode.com 作者:Thorsten Ball 机器之心编译 首先准备好我们的「文具」: 铅笔出场!让我们直接开始,用四个简单的命令来设置一个新的 Go 项目: 知名 Go 大佬 Thorsten Ball 最近用 315 行代码构建了一个编程智能体,并表示「它运行得非常好」且「没有护城河」(指它并非难以复制)。 Thorsten Ball 在编程领域以其对系统编程和编程语言的深入研究而闻名,尤其擅长解释器、编译器和虚拟机等主题。他撰写的《用 Go 语言自制编译器》和《用 Go 语言自制解释器》则被视为编译原理领域的「入门平替」。 虽然这个编程智能体无法和 Claude、Gemini 等推出的编码功能相媲美,却为初学者提供了一个探索智能体的良好学习范例。这反映了他一贯的理念:通过实践和 开源项目揭开技术的「神秘面纱」。 Thorsten Ball 在博客中分享了他的具体操作步骤。(注:本文中的代码截图可能并不完整,详细内容请参阅原博客。) 博客地址:https://ampcode.com/how-to-build-an-agent 乍看之下,智能体编辑文件、运行命令、自行解决错误似乎很复杂,但 ...
阿里云通义点金发布DianJin-R1金融领域推理大模型,32B模型荣膺榜首
机器之心· 2025-05-03 04:18
本文由阿里云通义点金团队和苏州大学联合完成。 近日,阿里云通义点金团队与苏州大学携手合作,在金融大语言模型领域推出了突破性的创新成果: DianJin-R1 。这款推理增强型金融大模 型,融合了先进的技术和全面的数据支持,专为金融任务而设计。 全面开源的 Reasoning 数据集 : DianJin-R1 的独特亮点之一是其全面开源的 Reasoning 数据集—— DianJin-R1-Data 。该数据集基于通 义点金团队去年在 ACL-2024 上发表的 CFLUE Benchmark 上进行的全面升级,整合了 FinQA 和中国合规检查(CCC)数据集,为金融推 理任务提供了强大的基础。目前已经开源,旨在支持和推动金融领域的研究和应用。 尽管取得了这些改进,最近在金融基准上的评估揭示出,由于需要领域特定的知识、精准的数值推理以及严格遵循监管要求,金融领域的推理仍 然特别具有挑战性。有效应对这些挑战需要专门的推理策略,能够处理结构化的金融信息和开放性问题解决。 为此,我们推出了 DianJin-R1,这是一种融合推理增强监督和强化学习来提高金融推理任务表现的 LLM。 全面开源的 Financial ...
用多模态LLM超越YOLOv3!强化学习突破多模态感知极限|开源
量子位· 2025-05-03 04:05
于恩 投稿 量子位 | 公众号 QbitAI 超越YOLOv3、Faster-RCNN,首个在COCO2017 val set上突破30AP的 纯多模态开源LLM 来啦! 华中科技大学、北京邮电大学等多所高校研究团队共同推出的 Perception-R1 (PR1) ,在视觉推理中最基础的感知层面,探究rule- based RL能给模型感知pattern带来的增益。 PR1重点关注当下主流的 纯视觉 (计数,通用目标检测) 以及 视觉语言 (grounding,OCR) 任务,实验结果展现出在模型感知策略上 的巨大潜力。 然而,在识别物体和真正以细致入微的理解和逻辑感知视觉世界之间存在微妙的差异。虽然MLLM在一般的视觉问答方面越来越出色,但它们 在需要精确物体定位、准确计数多个物体、在复杂布局中完美阅读文本或执行复杂视觉推理的任务上常常表现不佳。这就像知道图片中有一只 猫和能够精确指出它的耳朵、计算它的胡须或理解它与其他物体的互动之间的区别。 强化学习的崛起与Perception-R1的诞生 强化学习 (Reinforcement Learning, RL) 引发了语言模型的范式转变。像RLHF (来自人 ...
唐兴资本:睿见果敢,洞察投资项目潜藏的巨大价值
Sou Hu Cai Jing· 2025-05-02 02:58
Group 1 - The emergence of DeepSeek, a large model comparable to ChatGPT, has created significant waves in the global technology and capital markets, igniting enthusiasm for innovation and investment opportunities in the tech sector [3] - Tangxing Capital focuses on discovering and nurturing high-growth potential hard tech companies, aiming to drive industrial upgrades and regional economic development through a comprehensive support system [3][4] - The investment team at Tangxing Capital possesses deep industry backgrounds and professional investment capabilities, allowing them to accurately grasp technology development trends and identify quality projects [3][4] Group 2 - Young entrepreneurs like Liang Wenfeng and Wang Xingxing exemplify the characteristics of contemporary tech leaders, showcasing strong learning abilities and rapid application of new technologies [4][5] - These entrepreneurs break traditional thinking and industry boundaries, integrating resources across sectors to create new application scenarios and business models [5][6] - Key traits admired in successful entrepreneurs include innovation spirit, cross-disciplinary integration ability, strategic vision, and focus on core business areas [6] Group 3 - The investment style of Tangxing Capital is characterized by "insightful decisiveness," emphasizing the ability to quickly identify and act on investment opportunities [7] - A notable investment decision involved a significant investment in Plater, a key player in the 3D printing industry, despite market uncertainties, which later yielded a tenfold return [9] - Plater's technology addresses complex manufacturing needs in aerospace, automotive, and medical sectors, significantly contributing to China's manufacturing transformation [8][9] Group 4 - The current bull market is driven by a combination of macroeconomic stability, loose monetary policy, and positive market sentiment, creating a conducive environment for investment [10][11] - The bull market enhances the financing environment for primary markets, encouraging entrepreneurship and accelerating company growth through increased funding [12][13] - The interaction between primary and secondary markets fosters a cycle of investment and exit opportunities, optimizing resource allocation and enhancing economic vitality [14]
苹果公司CEO库克:仍然对公司的人工智能(AI)和大语言模型(LLM)路线图感到兴奋。
news flash· 2025-05-01 21:53
苹果公司CEO库克:仍然对公司的人工智能(AI)和大语言模型(LLM)路线图感到兴奋。 ...