机器之心
Search documents
OpenClaw狂揽16万star,是时候聊聊Agent Tools的AB面了
机器之心· 2026-02-06 03:57
Core Insights - OpenClaw has gained significant popularity, achieving over 160,000 stars on GitHub, functioning as a 24/7 AI assistant capable of handling various tasks through messaging platforms [2] - Despite its innovative capabilities, OpenClaw faces challenges such as complex deployment, compliance issues, and frequent security vulnerabilities [3][4] - The need for secure, controllable, and scalable enterprise-level agents is highlighted, with Fire Mountain Engine's AgentKit positioned to address these challenges [7] Group 1: OpenClaw Overview - OpenClaw operates through a unified Gateway that coordinates various local and remote tools, but the lack of governance increases security risks [6] - The platform is not yet suitable for enterprise production environments due to its security vulnerabilities [7] Group 2: AgentKit Solutions - Fire Mountain Engine's AgentKit addresses the core pain points of enterprise agents, including tool fragmentation, inefficient calls, and security risks [7] - AgentKit has demonstrated its effectiveness in real-world applications, such as reducing query times from minutes to seconds in a retail chain and compressing compliance response times from weeks to hours in a fintech company [8][9] Group 3: Challenges in Agent Tools - The adoption of Agent Tools in enterprises has been slow due to issues like tool fragmentation, complex connections, and governance black-boxing [11] - Enterprises face challenges with numerous outdated services and APIs, leading to inefficiencies in integrating new tools [12] Group 4: Methodology and Design Principles - Fire Mountain Engine outlines key design considerations for Agent Tools, emphasizing the need for understandable, secure, and fault-tolerant interfaces [14] - The development phase should utilize Python's type system for parameter validation, while interface design must focus on clear documentation to guide model interactions [15][16] Group 5: AgentKit Gateway Features - AgentKit Gateway serves as a central hub, managing high concurrency and ensuring that agents can understand legacy interfaces [18] - The platform significantly reduces the cost of AI integration by automating the generation of tool definitions from existing APIs, achieving a 90% automation rate [21] Group 6: Efficiency and Management - AgentKit enhances efficiency by optimizing token consumption and improving response times through advanced caching techniques [25] - The introduction of AgentKit Registry allows for unified management of various resources, addressing issues of version control and cross-team sharing [26] Group 7: Security Measures - AgentKit Identity redefines identity and permissions for agent operations, implementing a zero-trust approach to ensure accountability and traceability [30][31] - The system utilizes dynamic temporary credentials and an end-to-end delegation chain to enforce the principle of least privilege [32] Conclusion - The rise of OpenClaw illustrates the potential of AI agents in enterprise applications, but significant security concerns remain [33] - Fire Mountain Engine's AgentKit aims to provide a comprehensive infrastructure for safely developing and deploying AI agents in business environments, facilitating the transition from personal assistants to responsible enterprise-level digital employees [34][35]
AgentDoG:为AI智能体戴上「诊断项圈」
机器之心· 2026-02-06 03:57
随着 AI 智能体(Agent)能力日益强大,其自主行为带来的安全风险也愈发复杂。现有安全工具往往只能给出「安全 / 不安全」的简单判断,无法告知我们风险的 根源。为此,上海人工智能实验室正式开源 AgentDoG (Agent Diagnostic Guardrail),一个专为 AI 智能体设计的 诊断式安全护栏框架 。它不仅能精准判断 Agent 行 为的安全性,更能 诊断风 险来源、追溯失效 模式、解释决策动因 ,为 AI 智能体的安全发展保驾护航。 当 AI 智能体「放飞自我」,如何确保安全? AI 智能体(Agent)正在从实验室走向现实,它们能自主规划、调用工具、与环境交互,在科研、金融、软件工程等领域展现出巨大潜力。然而,这枚硬币的另一 面是前所未有的安全挑战。 一个能够操作文件、调用 API、访问网络的 Agent,其行为风险不再仅仅是「说错话」。它可能因为一条隐藏在网页中的恶意指令而泄露你的隐私文件,可能因错 误理解工具的参数而造成经济损失,甚至可能在多步操作中「悄无声息」地偏离正轨,执行危险动作。 面对这些 「智能体式」的风险 (Agentic Risks),现有的 guard mode ...
Agentic Memory开年就卷起来了?刚刚,华人团队MemBrain拿下多项SOTA!
机器之心· 2026-02-06 01:05
Core Insights - The article discusses the rapid evolution of Agentic Memory in AI, emphasizing that without memory, AI agents are merely advanced autocomplete tools. To handle complex projects or long-term tasks, AI must possess a structured long-term memory mechanism [1][3]. Industry Trends - There is a significant shift in the AI industry towards persistent memory as a critical capability for agents, with major players like OpenAI and Anthropic pushing the limits of context windows [1][3]. - Sequoia Capital highlights that one of the core challenges for future agents is achieving persistent identity, which involves remembering user interactions over time while maintaining consistent understanding and context [3]. Company Highlights - Feeling AI, a newly established startup, has recently launched MemBrain1.0, achieving state-of-the-art (SOTA) results in several memory benchmarks, surpassing existing systems like MemOS and EverMemOS [4][5]. - The team behind Feeling AI is led by Dr. Dai Bo, a young scientist in generative AI, who has previously worked at NTU and Shanghai AI Lab. The team has completed two rounds of funding exceeding 100 million yuan, focusing on world models and 3D dynamic interaction [4][19]. MemBrain1.0 Performance - MemBrain1.0 achieved new SOTA with accuracy rates of 93.25% and 84.6% on the LoCoMo and LongMemEval benchmarks, respectively, due to its refined entity-time context management design [9]. - In the PersonaMem-v2 benchmark, MemBrain1.0 outperformed existing methods with an accuracy of 51.50%, demonstrating deep insights into user preferences [10]. Technical Innovations - MemBrain's architecture allows for flexible deployment and asynchronous memory updates, enhancing the system's adaptability [15]. - The system's design focuses on precise extraction of entities and timestamps, which is crucial for high-level tasks like associative analysis and logical reasoning [16]. - MemBrain organizes information into semantic units that can be loaded on demand, allowing for deeper participation of large language models (LLMs) in reasoning tasks [17]. Future Outlook - The article suggests that the ability to solve the "forgetting syndrome" of agents will be key to advancing towards Artificial General Intelligence (AGI) [27]. - The evolution of memory capabilities is seen as a transition from "stateless" single calls to "conscious" continuous improvement, marking a new beginning for AI in user interaction and co-creation [27].
Stable-DiffCoder超越自回归模型!扩散模型在代码生成取得新突破
机器之心· 2026-02-05 23:45
Core Insights - The article discusses the launch of Stable-DiffCoder, a new diffusion language model developed by Huazhong University of Science and Technology and ByteDance, which aims to explore whether diffusion training can enhance model capabilities beyond traditional autoregressive (AR) models [1] Group 1: Model Performance - Stable-DiffCoder outperformed its AR counterparts and several strong open-source models like Qwen2.5-Coder and DeepSeek-Coder on multiple mainstream code benchmarks, demonstrating the effectiveness of the diffusion training paradigm as a powerful data augmentation method [1] - In the 8B model category, Stable-DiffCoder achieved a score of 79.3 on HumanEval and 83.6 on MBPP, surpassing many existing models [23][24] Group 2: Training Methodology - The model utilizes a continuous pre-training (CPT) approach with Block Diffusion and various stability optimization strategies to enhance performance [1] - The training process is designed to first compress knowledge using AR methods before transitioning to diffusion techniques, which helps in efficiently learning a diffusion language model [15][16] Group 3: Knowledge Learning Challenges - The article highlights challenges in the diffusion process, such as the introduction of noise and incorrect knowledge mapping, which can hinder effective learning [5][11] - It emphasizes the importance of maintaining a clean sample distribution during training to ensure effective knowledge transfer [11][20] Group 4: Future Implications - The release of Stable-DiffCoder suggests a new path for the evolution of large models, indicating that AR models can be used as efficient knowledge compressors while diffusion methods can act as enhancers to elevate model intelligence [31]
硬碰硬!刚刚,Claude Opus 4.6与GPT-5.3-Codex同时发布
机器之心· 2026-02-05 23:45
Core Insights - The article discusses the recent releases of advanced AI models by Anthropic and OpenAI, specifically Claude Opus 4.6 and GPT-5.3-Codex, highlighting their significant improvements in performance and capabilities [2][15]. Summary of Claude Opus 4.6 - Claude Opus 4.6 represents a major upgrade for Anthropic's flagship AI model, featuring a more cautious planning approach and the ability to maintain longer autonomous workflows [5]. - The model introduces a context window of 1 million tokens, allowing it to process and reason with significantly more information than previous versions [6]. - It includes a "smart agent team" feature, enabling multiple AI agents to work on different aspects of coding projects simultaneously [6]. - Opus 4.6 outperformed competitors in various assessments, achieving the highest scores in Terminal-Bench 2.0 and leading in the "Humanity's Last Exam" [7]. - In GDPval-AA, Opus 4.6 scored approximately 144 Elo points higher than OpenAI's GPT-5.2 and 190 points higher than its predecessor, Claude Opus 4.5 [7]. - The model's performance in MRCR v2 testing showed a score of 76%, significantly higher than Sonnet 4.5's 18.5%, indicating a qualitative leap in context utilization [9]. Summary of GPT-5.3-Codex - OpenAI's GPT-5.3-Codex claims to have the best coding performance to date, achieving record scores in multiple benchmarks, including 56.8% in SWE-Bench Pro and 77.3% in Terminal-Bench 2.0 [16][19]. - The model integrates the advanced coding capabilities of GPT-5.2-Codex with enhanced reasoning and expertise from GPT-5.2, resulting in a 25% speed improvement [19][20]. - GPT-5.3-Codex is designed to function as a comprehensive work assistant, capable of handling tasks across the software lifecycle, including debugging, deployment, and user research [25]. - The model allows for real-time interaction, enabling users to guide and supervise multiple working agents without losing context [27]. - OpenAI emphasizes that the advancements in GPT-5.3-Codex have fundamentally changed the workflow of their research and engineering teams, enhancing productivity and interaction quality [28][29]. Conclusion - The article concludes that the competitive landscape of AI models is intensifying, with both Anthropic and OpenAI making significant strides in capabilities and performance, setting the stage for further developments in the industry [31].
ICLR 2026 Workshop二轮征稿开启:聚焦终身智能体的学习、对齐、演化
机器之心· 2026-02-05 07:52
Core Insights - Artificial Intelligence is at a new turning point, with AI Agents based on Large Language Models (LLM), Reinforcement Learning (RL), and Embodied AI rapidly emerging, showcasing multi-dimensional capabilities such as planning, reasoning, tool usage, and autonomous decision-making [2] - The current mainstream paradigm faces critical bottlenecks, necessitating a shift towards Lifelong Agents that can continuously learn, align over the long term, evolve autonomously, perceive resources, and be sustainably deployed [2] Workshop Overview - The Lifelong Agent Workshop, initiated by institutions like UIUC, Edinburgh, Oxford, and Princeton during the ICLR 2026 conference, aims to create a cross-disciplinary forum to systematically advance the Lifelong Agent research paradigm [3] - The workshop will address key issues related to Lifelong Agents, including language intelligence, reinforcement learning, embodied systems, multi-agent collaboration, and AI for science, defining the next technological milestone for Agent development [3] Challenges in Lifelong Learning - The phenomenon of catastrophic forgetting remains a significant challenge when models face dynamic and out-of-distribution (OOD) tasks, leading to decreased alignment consistency as user goals, environmental feedback, and contextual constraints evolve over time [4] - Real-world operational constraints such as computational power, token, energy, and interaction costs hinder the sustainability of these systems [4] Workshop Details - The workshop is scheduled for April 26-27, 2026, in Rio de Janeiro, featuring a hybrid format for participation [8] - The expected attendance is between 200-400 in-person participants and 500-600 online attendees [8] Submission Topics - The workshop encourages cross-disciplinary research focused on long-term operational Agents, particularly in areas such as Lifelong Learning, Lifelong Alignment, Self-Evolving Agents, and Embodied & Real-World Lifelong Agents [7] - Specific topics include memory-augmented RL, continual exploration, user goal change modeling, and multi-agent lifelong collaboration ecosystems [9][10] Future Directions - Lifelong Agents represent an upgrade in intelligent paradigms, aiming to create stable, autonomous, and sustainably growing systems that can contribute to scientific discovery and cross-modal interaction [11] - The workshop seeks to push Lifelong Agents towards becoming the next significant advancement in the field, addressing challenges related to resource-constrained learning and reasoning [12]
强化学习远不是最优,CMU刚刚提出最大似然强化学习
机器之心· 2026-02-05 07:52
机器之心编辑部 在大模型时代,从代码生成到数学推理,再到自主规划的 Agent 系统,强化学习几乎成了「最后一公里」的标准配置。 直觉上,开发者真正想要的其实很简单: 让模型更有可能生成「正确轨迹」 。从概率角度看,这等价于最大化正确输出的概率,也就是经典的最大似然 (Maximum Likelihood)目标。 然而,一项来自 CMU、清华大学、浙江大学等研究机构的最新工作指出了一个颇具颠覆性的事实: 现实中广泛使用的强化学习,并没有真正在做最大似然优化。严格的理论分析显示, 强化学习只是在优化最大似然目标的一阶近似 —— 距离我们以为的最 优训练目标,其实还差得很远。 正是基于这一观察,研究团队对强化学习的目标函数进行了重新审视,提出了最大似然强化学习(Maximum Likelihood Reinforcement Learning):将 基于正确性的强化学习重新刻画为一个潜变量生成的最大似然问题,进一步引入一族 以计算量为索引的目标函数,使训练目标能够逐步逼近真正的最大似然 优化。 论文标题: Maximum Likelihood Reinforcement Learning 论文链接: https: ...
ICLR 2026 | 这道题是否需要用图思考?模型来告诉你!自适应思考模式切换助力通用视觉推理提升
机器之心· 2026-02-05 04:35
本文来自复旦大学和阿里巴巴未来生活实验室,已中稿 ICLR 2026。 目前的视觉推理方法衍生出了多种思考模式,主要有和 LLM 一致的纯文本思考模式以及更加贴近图片的用图思考。两种推理模式在不同的领域各有所长, 但现有的工作聚焦于单个思考模式,无法充分利用两个模式之间的互补性。 因此,本文提出了 mixture-of-visual-thoughts,一种自适应的推理范式:目标是 将不同推理模式整合到一个模型内部并引导其进行自适应的模式选择。 为了让模型学习这样的推理范式,研究者引入了一个两阶段的学习框架 AdaVaR,通过 SFT 学习不同的推理模式,并设计了一个专门的 AdaGRPO 算法来 在强化学习设定下引导模型学习如何根据问题选择合适的推理模式。 背景:视觉推理的不同思考模式 目前对于 LVLM (large vision-language model) 的视觉推理方法已经有了大量的探索,其中主流推理范式包括以下两种: 论文标题: Mixture-of-Visual-Thoughts:Exploring Context-Adaptive Reasoning Mode Selection for Ge ...
智能必须基于世界模型?我们和蚂蚁灵波团队聊了聊
机器之心· 2026-02-05 04:35
Core Viewpoint - The article discusses the transition from large language models (LLMs) to a new era of physical AI, emphasizing the need for AI to understand and interact with the real world rather than just processing language [1][2]. Group 1: Physical AI Development - Yann LeCun argues that true intelligence requires the ability to predict and plan, which current LLMs lack [2]. - Ant Group's Robbyant has made significant strides in physical AI by releasing four embodied intelligence models in a short span, showcasing a unique approach to AI development [2][5]. - The company aims to build intelligence from physical interactions, moving beyond the digital realm [3][4]. Group 2: Technological Approach - Ant Group's strategy focuses on using real-world data and internet data for training models, rejecting the prevalent "Sim-to-Real" approach in favor of direct learning from real-world interactions [7][9]. - The LingBot-VLA model, trained on over 20,000 hours of high-quality real machine data, has surpassed several international benchmarks, indicating a significant advancement in robotics technology [9]. - The LingBot-VA model represents a breakthrough in general robot control, utilizing causal video-action world models to predict and act in real environments [10][12]. Group 3: Future Aspirations and Ecosystem - Ant Group envisions creating an open-source ecosystem for robotics, akin to an "Android system" for robots, emphasizing collaboration with data providers to enhance model training diversity [18][19]. - The company is focused on providing efficient post-training tools to help hardware manufacturers adapt their robots to the intelligence developed by Robbyant [19]. - Ant Group's long-term goal is to integrate embodied intelligence into various service sectors, leveraging its strengths in connecting people with services [22][24].
谷歌做了个论文专用版nano banana!顶会级Figure直出
机器之心· 2026-02-05 04:35
编辑|SIA 你负责写方法,AI负责画 Figure。 科研打工人,终于等来 「 画图解放日 」 。 还在为论文里的方法框图熬夜画 PPT、拉箭头、对齐字体吗? 一张 Figure 2,动辄几个小时,严重的甚至能耗上几天,科研人的 「 隐藏副本 」 不是实验,而是画图。 既要忠于论文原意,又得暗暗符合顶会那套心照不宣的 「 学术审美 」 :颜色不能土,布局不能乱,箭头更不能连错。 看起来只是一张图,实际上是美学、逻辑和耐心的三重折磨。 那么,问题来了:现在的大模型已经能写论文、跑实验、改代码,为什么偏偏搞不定这些学术插图?有人可能会问:DALL·E、基础 VLM 不行吗? 答案是:真不行。 它们画出来的图往往是:模块和文字对不上、字体直接乱码、箭头逻辑错误。图是 「 好看 」 ,但不中用啊。 于是,一个狠角色出现了:PaperBanana 来自北大 + Google Cloud AI Research 的团队,目标很简单也很狂: 你写方法,AI 画 Figure,水准呢?直接投顶会的那种。 科研打工人,终于等到了 「 画图解放日 」 。 来看效果成色。 PaperBanana 展示了解决两类学术插图的能力: ...