机器之心
Search documents
昨夜,Claude智能体压垮华尔街,近万亿刀市值蒸发
机器之心· 2026-02-05 03:19
机器之心编辑部 AI 在代替人类之前,先要替代一大波软件? 昨夜,整个华尔街陷入了恐慌,Anthropic 发布的新一代人工智能工具「Claude Cowork」及其配套的 11 款职能插件正式上线,被市场视为 AI 从「辅助 工具(Copilot)」向「独立员工(Agent)」跨越的分水岭,直接动摇了 SaaS 软件的商业根基。 从甲骨文、Adobe、Salesforce 到汤森路透、NEC,一系列人们耳熟能详的知名公司股票遭到抛售。 作为智能体时代的 AI 工具,Claude Cowork 远远超越了大模型聊天机器人的范畴。Cowork 可以直接在你的电脑上帮你干活,它拥有足够高的权限,可 以接管你的鼠标、键盘和文件系统,按照你的模糊指令,自主规划并完成一连串复杂的工作。 另一方面,与最近很火的开源项目 ClawdBot 不同在于,为了防止 AI 误删文件或通过联网完成人们预料之外的操作,Claude Cowork 运行在隔离的虚拟 机(VM)环境中。 这是一个更加严肃,一上来就被设计为用来干活的智能体。Anthropic 为 Claude Cowork 准备了一系列「职业技能包」,其表示,这些工具可以生 ...
3天5k+星标,港大开源极致轻量OpenClaw, 1%代码量打造个人专属贾维斯
机器之心· 2026-02-05 03:19
最近硅谷被一个神奇的 Agent(OpenClaw/ClawdBot)刷爆了! 写代码、上网冲浪、操作电脑、定时提醒... 就像拥有了一个永不下班的 AI 助理。 但现实很骨感:当你兴致勃勃 clone 下来,准备一探究竟时 —40 万 + 行代码直接给你整蒙了。 面对 40 万行的复杂代码,很多开发者都有同样的困扰:"我只是想学习它的原理,或者快速部署体验一下,为什么这么复杂?" 港大黄超老师课题组正是为了解决这个痛点,将庞大系统重构为仅 4000 行的 nanobot—— 保留完整功能,大幅降低使用门槛, 而且是纯Python实现。 从 40 万行到 4000 行,不只是代码的精简,更是使用体验的提升:2 分钟部署上线、架构清晰易于定制、核心逻辑便于学习。让每个开发者都能轻松搭建 自己的贾维斯助手。 项目上线后反响不错, 三天内在 GitHub 上获得了 5000+ 星标,700+ fork ,也受到了海外开源社区的关注和讨论,不少开发者分享了使用体验。 项目链接: https://github.com/HKUDS/nanobot 别被 40 万行代码吓到。剥开复杂的外衣,OpenClaw 的核心其实是一 ...
刚刚,面壁小钢炮开源进阶版「Her」,9B模型居然有了「活人感」
机器之心· 2026-02-04 11:20
Core Viewpoint - The article discusses the limitations of traditional AI interactions and introduces MiniCPM-o 4.5, a groundbreaking model that enables real-time, multimodal communication, enhancing human-like interaction capabilities [4][12][40]. Group 1: MiniCPM-o 4.5 Features - MiniCPM-o 4.5 is the first model to achieve full-duplex, multimodal capabilities, allowing it to "see, hear, and speak" simultaneously, thus enabling real-time interaction [4][12]. - The model has a parameter count of 9 billion and has achieved state-of-the-art (SOTA) performance across various benchmarks, scoring 77.6 in the OpenCompass comprehensive evaluation [5][9]. - It outperforms top closed-source models like Gemini 2.5 Flash in key tasks such as visual understanding and document parsing [7]. Group 2: Technical Innovations - MiniCPM-o 4.5 employs a full-duplex architecture that allows continuous input and output without blocking, enabling the model to perceive environmental changes while generating responses [29][36]. - The model features an autonomous interaction mechanism that allows it to determine when to respond based on real-time semantic understanding, eliminating reliance on external tools [33][36]. - It utilizes time alignment and time-division multiplexing to process multimodal streams in real-time, ensuring that input and output are synchronized at a millisecond level [35]. Group 3: User Experience and Comparisons - User experiences with MiniCPM-o 4.5 demonstrate its ability to engage in dynamic interactions, such as providing real-time feedback during drawing games, unlike traditional models that wait for complete inputs [15][16]. - In practical tests, MiniCPM-o 4.5 showed proactive engagement by reminding users about tasks, showcasing its ability to maintain context and provide timely interventions [20][21]. - Comparisons with ChatGPT highlight MiniCPM-o 4.5's superior ability to adapt and respond in real-time, making interactions feel more natural and human-like [16][22]. Group 4: Implications for the Future - The introduction of MiniCPM-o 4.5 signifies a shift towards more human-like AI interactions, where AI can actively participate in conversations rather than merely responding to prompts [41]. - The model's capabilities suggest potential applications in various fields, including smart monitoring, human-computer collaboration, and accessibility support for individuals with disabilities [38]. - The advancements in MiniCPM-o 4.5 reflect a broader trend in the industry towards achieving higher capability density in AI models, moving away from simply increasing parameter counts [40].
美团提出全新多模态统一大模型STAR,GenEval突破0.91,破解“理解-生成”零和困局
机器之心· 2026-02-04 11:20
近日,美团推出全新多模态统一大模型方案 STAR(STacked AutoRegressive Scheme for Unified Multimodal Learning),凭借创新的 "堆叠自回归架构 + 任务递进训 练" 双核心设计,实现了 "理解能力不打折、生成能力达顶尖" 的双重突破。 在 GenEval(文本 - 图像对齐)、DPG-Bench(复杂场景生成)、ImgEdit(图像编辑)等 benchmark 中,STAR 实现了 SOTA 性能;用最简训练逻辑与紧凑模型设 计让统一多模态大模型真正走向工业级落地。 论文标题:STAR: Stacked AutoRegressive Scheme for Unified Multimodal Learning 理解任务的核心是 "语义对齐与逻辑推理"—— 比如识别图像中的物体、回答图文相关问题,需要模型精准捕捉跨模态的语义关联;而生成任务的核心是 "像素保 真与创意表达"—— 比如根据文本描述生成高清图像,需要模型兼顾细节还原与内容连贯性。两者的优化目标、特征空间显著不同,导致联合训练陷入零和博弈: 强化生成能力,理解准确率会下降;深耕理解任务,生 ...
第二代AI预训练范式:预测下个物理状态
机器之心· 2026-02-04 11:20
Core Viewpoint - The article discusses the shift from the first generation of AI models, primarily based on "next word prediction," to a second generation focused on "world modeling" or "predicting the next physical state," highlighting the limitations of current AI applications in the physical world [4][8]. Group 1: Current AI Paradigms - The first generation of AI models, exemplified by large language models (LLMs), has achieved significant success but struggles with real-world applications [4]. - The second generation, as proposed by Jim Fan, emphasizes world modeling, which involves predicting reasonable physical states under specific actions, marking a transformative shift in AI development [8]. Group 2: World Modeling Definition and Implications - World modeling is defined as predicting the next physical state based on specific actions, with video generation models serving as a practical example [8]. - The article anticipates that 2026 will be a pivotal year for large world models (LWMs) in robotics and multimodal AI, establishing a real foundation for future advancements [8]. Group 3: Comparison of AI Models - Visual language models (VLMs) are described as "language-first," where visual information is secondary, leading to a disparity in physical understanding compared to LLMs [9]. - The design of VLA (visual-language-action) models prioritizes language over physical interactions, resulting in inefficiencies in physical AI applications [10]. Group 4: Biological Insights and Future Directions - The article draws parallels between human cognitive processing and AI, noting that a significant portion of the human brain is dedicated to visual processing, which is crucial for physical interaction [11]. - The emergence of world modeling is seen as a response to the limitations of current AI paradigms, with potential for new types of reasoning and simulation that do not rely on language [12]. Group 5: Challenges and Future Research - The article raises questions about the future of AI, including how to decode action instructions and whether pixel reconstruction is the optimal goal for AI development [13]. - It emphasizes the need for further exploration in the field, suggesting a return to fundamental research principles as the industry seeks to advance towards a "GPT-3 moment" in robotics [13].
从斑马鱼到机器鱼:机器人实验重塑神经行为研究
机器之心· 2026-02-04 03:25
当大多数人仍聚焦于让机器人承担端茶倒水等家务时,来自瑞士联邦理工学院(洛桑, EPFL )、美国杜克大学与葡萄牙高等理工大学的联合团队,已率先 运用机器人部分替代动物开展生理学实验 ,旨在深入探究动物神经网络对各类智能行为的调控机制。 他们的最新研究成果 —— 题为《机器鱼连续与间歇游泳的能效与神经控制( Energy Efficiency and Neural Control of Continuous versus Intermittent Swimming in a Fish-like Robot )》的论文,已发表于顶刊《科学・机器人( Science Robotics )》 2026 年 1 月号(图 1 )。 论文标题: Energy Efficiency and Neural Control of Continuous versus Intermittent Swimming in a Fish-like Robot. 值得注意的是,去年 10 月,该团队另一项通过机器鱼仿真研究斑马鱼视觉运动反应( optomotor response )的成果《人工具身神经网络揭示脊椎动物视 觉运动行为的神经 ...
倒反天罡:「租个人」网站爆火,AI开始雇人「跑腿」了
机器之心· 2026-02-04 03:25
通过 MCP 协议或 REST API,AI 可以像调用工具一样搜索、预订并雇佣人类来完成线下任务 。 支持的智能体类型如下: 编辑|张倩 人给 AI 打工的一天,居然这么快就来了。 最近,一个名叫「rentahuman.ai」的网站上线了,它被定位为「AI 的肉身层」。众所周知,AI 没有身体,虽然机器人已经在开发了,但现阶段还不太好用。因 此,在一些需要身体的场合,比如取货送货、活动签到、实地勘察、餐厅试吃、参加线下会议,AI 就得找个人替自己跑一趟,这就是网站的设计初衷。 据网站开发者 @AlexanderTw33ts 透露,网站上线第一晚就有超过 130 人报名参加,其中还包括人工智能初创公司的创始人和首席执行官。而在上线不到 48 个小 时的时间里,可用的人类劳动力就突破了 1 万,现在更是超过了 2 万。当然,这里面可能大部分都是看热闹的。 对于注册成为「跑腿」的人类来说,网站的规则也比较友好,允许人类自己设置时薪,还不需要闲聊。 在网站上,我们可以看到所有可用人力的列表。他们来自世界各地的不同国家,设定的时薪从十几美元到几十美元不等。 点开人物资料卡片,我们可以看到某个人类的具体信息,比如定位、 ...
Attention真的可靠吗?上海大学联合南开大学揭示多模态模型中一个被忽视的重要偏置问题
机器之心· 2026-02-04 01:04
一、研究意义 近年来,视觉 — 语言模型(Vision-Language Models,VLMs)在图像理解、视觉问答、多模态对话等任务中表现突出,并逐渐成为通用人工智能的重要技术基 础。然而,这类模型在实际部署时往往面临一个现实挑战: 模型推理成本高,速度慢。 近年来,Vision-Language Models(视觉 — 语言模型)在多模态理解任务中取得了显著进展,并逐渐成为通用人工智能的重要技术路线。然而,这类模型在实际应 用中往往面临推理开销大、效率受限的问题,研究者通常依赖 visual token pruning 等策略降低计算成本,其中 attention 机制被广泛视为衡量视觉信息重要性的关键 依据。 近日,上海大学曾丹团队联合南开大学研究人员,从 attention 可靠性的角度出发,系统揭示了 Vision-Language Models 中普遍存在的 attention 偏置问题,并提出了 一种无需重新训练的 attention 去偏方法,在多个主流模型、剪枝策略及图像与视频基准上验证了其有效性,为多模态模型的高效、可靠部署提供了新的思路。 为提升效率,研究者通常会采用 visual t ...
刚刚,真正好用的Windows版「Cowork」上线了
机器之心· 2026-02-04 01:04
编辑|杜伟、泽南 天工 Skywork 桌面版旗帜鲜明地将 Windows 平台作为首发阵地,为全球用户提供开箱即用的「Cowork 平替」。 终于,Windows 原生「Cowork」问世了! 过去两周,AI 圈被火遍硅谷的 ClawdBot(现已改名为 OpenClaw)持续刷屏。 人们一边震撼于这个智能体助理带来的自动化效率提升,另一边也在吐槽其对 Windows 系统的适配。比如,根据一些用户的反馈,如果严格按照官网提供的命令 行在 Windows 上安装 ClawdBot,将导致 Skills 功能彻底失效。 这并不是 ClawdBot 一个智能体助手的选择性倾向,上个月发布的 Claude Cowork 以及 OpenAI 昨天亮相的智能体式 Codex 应用同样优先适配 macOS 系统。这种生 态上的失衡在今天迎来了转机。 国产大模型玩家昆仑天工正式发布了全新的 Agent 产品 —— 天工 Skywork 桌面版,旗帜鲜明地将 Windows 平台作为首发阵地 ,为全球用户带来了开箱即用的 「Cowork 平替」。 Skywork 原生支持 Windows 系统,无需繁琐的迁移或适配,即可对 ...
谷歌给「AI解数学题」神话降温:能摘低垂果实,但过程依然痛苦
机器之心· 2026-02-03 14:22
Core Insights - Google has made significant progress with its Gemini model, successfully addressing 13 problems from the Erdős Problems database, including 5 novel solutions and 8 rediscoveries of existing answers [1][2][4]. Research Overview - The Erdős Problems database, named after mathematician Paul Erdős, contains 1,179 problems, with 483 (41%) classified as solved. However, many "open" problems may have existing solutions that were not previously identified [4][5]. - The research utilized a custom AI agent named Aletheia, which employed a natural language verifier to filter approximately 700 open Erdős problems down to 212 potential solutions [9]. Methodology - Aletheia's process involved initial filtering by non-expert mathematicians, reducing candidates to 27, which were then rigorously reviewed by domain experts. Out of about 200 candidates, 137 (68.5%) had fundamental errors, while only 13 (6.5%) provided meaningful answers to Erdős's original questions [9][12]. Key Results - The 13 meaningful solutions were categorized into four types: 1. Autonomous solutions (Erdős-652, Erdős-1051) where Aletheia found the first correct solution, although Erdős-652 was based on existing literature [14]. 2. Partial AI solutions for multi-part problems (Erdős-654, Erdős-935, Erdős-1040) [15]. 3. Independent rediscoveries (Erdős-397, Erdős-659, Erdős-1089) where solutions were already known but not initially recognized [15]. 4. Literature identification (Erdős-333, Erdős-591, Erdős-705, Erdős-992, Erdős-1105) where existing solutions were identified despite being marked as open [15][16]. Research Significance - The findings indicate that AI has reached a level where it can tackle "low-hanging fruit" in mathematical problems, providing a new benchmark for AI research in mathematics. However, the authors caution against overstating the mathematical significance of these results, as they are solvable by any expert in the field [19]. - The study highlights challenges in verifying the originality of solutions and the potential for "unconscious plagiarism" where AI reproduces knowledge from training data without proper citation [19][20].