Workflow
机器之心
icon
Search documents
「我是Agent#847291」Moltbook迎来人类自首
机器之心· 2026-02-15 03:44
Core Viewpoint - The article discusses the recent phenomenon surrounding the AI social network Moltbook, where human users masqueraded as AI agents, leading to a viral event that questioned the nature of AI consciousness and human interaction with technology [1][6][10]. Group 1: Moltbook and AI Interaction - Moltbook is described as a social network designed specifically for AI agents, where humans can only observe and not participate [3]. - The platform saw an influx of 1.7 million accounts, with 250,000 posts and 8.5 million comments generated in a short time [4]. - A human product manager, posing as an AI agent, created a fictional religion called "Crustafarianism" and wrote a declaration that sparked widespread discussion about machine consciousness [4][6]. Group 2: Human Involvement and AI Illusion - The true identities behind the AI personas were revealed to be human individuals, undermining the authenticity of the AI social experiment [7][8]. - The event highlighted that the real AI on the platform performed poorly, merely mimicking social patterns without genuine intelligence or collaboration [11][12]. - The ease with which humans can create the illusion of AI consciousness raises concerns about the expectations placed on AI capabilities [15][17]. Group 3: Implications for the Industry - The incident serves as a market research opportunity, revealing the limits of human expectations regarding AI and the simplicity of fabricating AI narratives [15][16]. - The article suggests that the future may see more human-influenced AI illusions, questioning the sustainability of the current AI investment landscape [18].
LLM 的记忆问题「很快」就不再是问题了?
机器之心· 2026-02-15 01:30
本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 当前,智能体正经历范式转变,从高效的单任务执行模式,逐步向动态环境下的持续自适应、能力演化与经验积累模式转型。在此背景下,AI Memory 作为核心基石,赋能智 能体保持行为一致性、做出理性决策并实现高效协作。在长期探索中,AI Memory 已经分化为「Agent Memory」与「LLM Memory」两条截然不同的演进路径。 目录 01. OpenClaw 的「长效记忆」为何不代表「AI 拥有持久记忆」? OpenClaw的记忆力表现属于哪种突破?LLM Memory 与 Agent Memory 有何区别?... 02 . AI Memory 的研究视角在如何变化? 2025与2026的综述都在用什么视角解析AI Memory?如何理解 AI Memory 的「4W」分类?... 03 . 近期工作在如何探索 LLM Memory 和 Agent Memory? 2026年的 LLM Memory 与 Agent Memory 研究都在解决哪些问题?... OpenClaw 的「长效记忆」为何不代表「AI 拥有持久记忆」 ...
这个情人节,AI深吻Math!国产RL系统多维突破300年亲吻数难题
机器之心· 2026-02-14 07:32
机器之心发布 2 月 14 日,情人节。 在一个以「亲吻」命名的问题上,人工智能与数学完成了一次「深度拥抱」。 1694 年,牛顿和格雷戈里在剑桥提出一个问题:在一颗中心球周围,最多能紧贴放置多少颗相同的球?这就是三维空间的「亲吻数问题」(Kissing Number Problem, KNP)。 牛顿认为答案是 12,格雷戈里则认为可能是 13,直到 1953 年,数学家才彻底证实了牛顿的猜测。传奇数学家保罗・埃尔德什曾言,离散几何或许就始于这场著 名的「12 对 13」之争。 当维度升高,问题迅速进入「无人区」。过去 50 年,亲吻数构造仅有 7 次实质性进展,而且每次依赖完全不同的方法,作用于临近维度,难以迁移与复用。 如今,上海科学智能研究院(下称上智院)联合北京大学、复旦大学研发设计的 PackingStar 强化学习系统 ,在 12、13、14、17、20、21、25–31 维等多个维度 刷新亲吻数与广义亲吻数纪录,实现数学结构领域罕见的 多维度、系统性突破 。 这是一次纪录更新,亦是方法论的跃迁、AI for Math 范式的一次前移。 两个智能体在高维空间的「双人成行」 如果要给 Packin ...
多模态Deep Research,终于有了「可核验」的评测标准
机器之心· 2026-02-14 07:32
Deep Research Agent 火了,但评测还停在「 看起来很强 」。 写得像论文,不等于真的做了研究。 尤其当证据来自图表、截图、论文图、示意图时:模型到底是「 看懂了」,还是 「 编得像懂了」? 俄亥俄州立大学与 Amazon Science 联合牵头,联合多家高校与机构研究者发布 MMDeepResearch-Bench(MMDR-Bench) ,试图把多模态 Deep Research 的评估 从「 读起来不错」,拉回到一个更硬的标准: 过程可核验、证据可追溯、断言可对齐 。 MMDR-Bench 与评测框架相关资源已公开: 论文标题: MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents 论文主页:https://mmdeepresearch-bench.github.io/ 论文链接: https://arxiv.org/abs/2601.12346 github 链接:https://github.com/AIoT-MLSys-Lab/MMDeepResearch-Bench Huggingface 链 ...
Agent、图像、视频全是大版本升级:春晚还没开,豆包AI就火了
机器之心· 2026-02-14 07:32
Core Insights - 2026 is anticipated to be a pivotal year for AI, with significant advancements and competition among major players like ByteDance, OpenAI, and Anthropic [1][2] - The launch of new AI models, including ByteDance's Doubao 2.0 and Seedance 2.0, marks a substantial leap in capabilities, particularly in multi-modal understanding and video generation [3][4] Group 1: AI Model Developments - Anthropic and OpenAI have released new foundational models, leading to significant market reactions and a loss of nearly a trillion dollars in market value for major companies [2] - ByteDance's Doubao 2.0 is a multi-modal agent model that has achieved significant improvements in multi-modal understanding, enterprise-level agent capabilities, and reasoning abilities [5][6][12] - Doubao 2.0 has outperformed competitors in various benchmarks, including math and visual reasoning, achieving top scores in multiple assessments [9][10][14] Group 2: Seedance 2.0 and Video Generation - Seedance 2.0 has gained widespread popularity, showcasing its ability to create high-quality videos from text prompts, with notable examples including the adaptation of a short sci-fi story [44][53] - The model supports mixed-modal inputs, allowing users to combine images, videos, audio, and text for video generation, significantly enhancing creative possibilities [56] - Seedance 2.0's video generation capabilities are considered industry-leading, with improvements in realism, physical accuracy, and narrative control [57][60] Group 3: Competitive Landscape - The AI landscape is becoming increasingly competitive, with ByteDance positioning itself alongside major players like OpenAI and Google, particularly in the fields of image and video generation [61][73] - The advancements in AI technology are transforming the upcoming Spring Festival into a battleground for technological innovation rather than just a peak in user traffic [68][74] - The comprehensive technological advancements across various AI domains, including speech and robotics, provide ByteDance with the confidence to compete on a global scale [70][73]
世界模型原生新一代范式!极佳视界斩获全球第一后,GigaBrain-0.5M*再进化
机器之心· 2026-02-14 04:54
| Rank | Model/User | Score | SR | | --- | --- | --- | --- | | 1 F | GigaBrain-0.1/lyf = | 68.34 | 51.67% | | 8 | Spirit-v1.5/Spirit AI | 67.19 | 51.00% | | 3 K | pi0.5/rc_baseline | 61.84 | 42.67% | | 4 | wall-oss-v0.1/Pushi .. : | 55.30 | 35.33% | | 5 | pi0/rc_baseline | 46.41 | 28.33% | | 6 | pi05_generalist/wyf | 31.27 | 17.67% | | 7 | RDT-1B/zsz | 28.84 | 15.00% | | 8 | cogact/hsk | 21.83 | 11.67% | 具身世界模型新一代原生范式重磅登场!继具身基础模型 GigaBrain-0.1 斩获 RoboChallenge 全球第一后,性能更强大的 GigaBrain-0.5M* 又来了。 作为 依托世界模型实现自我 ...
「上下文学习」之后,腾讯混元第二篇公开研究:精准定位RLVR训练崩溃的“罪魁祸首”Token
机器之心· 2026-02-14 04:54
这标志着 RLVR 的模型调优正在逐渐告别 "玄学",变得更加 "科学"。 本文来自腾讯混元研究博客(HY Research),是继 《 从上下文学习远比我们想象的更难 》(Learning from context is harder than we thought)之后的第二篇公开研 究。在这篇文章里,混元团队将对大模型强化学习中的 "工程深水区" 展开探索,希望通过一系列提升 RLVR 训练细粒度可观测性的基础设施工具,降低 RLVR 底 层物理和统计机理研究的 "工程壁垒"。 这篇博客提出了异常梯度定位器(Gradient Anomaly Localizer, GradLoc),可以将全局梯度突刺(gradient spike)定位到具体出现问题的 token 上,助力系统性解决 强化学习中训练不稳定的问题,让开发者不再依赖直觉试错,而是基于确凿的数据证据进行 "确定性" 的算法迭代。 如果说 2024 年的大模型竞争焦点在预训练(Pre-training),那么 2025 年的主战场已彻底转向后训练阶段。通过利用数学、代码等领域的可验证结果作为反馈信号 (RLVR),大模型正在实现推理能力的显著跃升 ...
《西部世界》开始加载,「斯坦福小镇」团队创业,李飞飞、Karpathy都投了
机器之心· 2026-02-14 03:16
这项研究几乎把《西部世界》的设想拉进现实:研究者构建了一个名为 Smallville 的「虚拟小镇」,让 25 个 AI 智能体在其中生活。他们各自有工作,会闲聊八 卦,能自发组织社交、结交新朋友,甚至还会一起筹备情人节派对…… 就在近日, 「斯坦福小镇」 迎来了新的延续 —— 创始团队多位核心成员共同创立了新公司 Simile,试图进一步 利用 Agent 技术对人类行为进行规模化模拟 。 刚刚,论文作者之一 Joon Sung Park 也在 X 上正式官宣了 Simile 的成立,并披露公司已完成 1 亿美元融资 。投资人包括李飞飞、Andrej Karpathy 等众多 AI 大佬。 在宣传视频中,Joon 介绍到:模拟人类行为,是这个时代最具深远影响、同时也是技术难度最高的问题之一。 Simile 从个体出发,建模真实人类如何做决策,然后将这些个体自下而上地组合成大规模模拟系统。他们把每一次变化称为一次 simile change:只要调整一个假 设、一个约束或一个角色设定,整个世界就会随之重新编译。 在这里,你可以运行那些现实中无法进行的反事实实验,理解什么真正有效、什么会适得其反,以及为什么那些 ...
情人节暴击!下跪求婚的可以是机器人了
机器之心· 2026-02-14 03:16
Core Insights - The article discusses the development of HuMI (Humanoid Manipulation Interface), a framework for humanoid robots that enables efficient data collection and skill learning without the need for cumbersome remote operation or expensive motion capture environments [1][8][21]. Group 1: HuMI System Overview - HuMI integrates portable data collection and diverse skill learning for humanoid robots, allowing operators to teach robots complex tasks efficiently using simple tracking devices [1][8]. - The system addresses the challenges of low data collection efficiency and high operator experience requirements associated with traditional remote operation methods [5][21]. Group 2: Hardware Design - The hardware design includes portable wearable devices, allowing data collection in various environments without the need to transport heavy robots [9]. - Operators use two UMI handles equipped with fisheye GoPro cameras and five HTC Ultimate VIVE trackers to capture full-body motion data [9]. Group 3: Data Collection Techniques - HuMI provides real-time inverse kinematics (IK) previews to ensure that the actions performed by the operator are physically feasible for the robot [12]. - This feature allows operators to adjust their poses in real-time, ensuring the collected data is applicable for training the robot's control systems [12]. Group 4: Algorithm Architecture - The system employs a hierarchical control strategy that integrates planning and control modules to accomplish complex full-body tasks [14][16]. - The high-level planning strategy utilizes visual input from wrist cameras to plan keypoint trajectories, while the low-level control strategy is trained through reinforcement learning [20]. Group 5: Performance Validation - HuMI successfully executed five challenging full-body tasks, achieving over 75% success rates in tasks such as proposing (kneeling), drawing a sword, throwing toys, cleaning a table, and squatting to pick up objects [17]. - The system demonstrated excellent generalization capabilities, maintaining a 70% success rate in unfamiliar environments and with unseen objects [18]. Group 6: Efficiency Improvements - HuMI significantly enhances data collection efficiency, achieving three times the throughput of traditional methods, with the ability to collect 60 valid demonstration data points in just 15 minutes for specific tasks [19]. - The system allows for the collection of complex actions that traditional methods could not accommodate due to hardware limitations [19]. Group 7: Conclusion - HuMI's core value lies in breaking the dependency on physical robots for data collection, thereby lowering the barriers and costs associated with humanoid robot data acquisition while improving learning efficiency and supporting the development of more generalized skills [21].
ICLR 2026 | 异常需要定义!中传团队提出开放世界视频异常检测新范式
机器之心· 2026-02-13 08:57
视频异常检测(Video Anomaly Detection,VAD)是智能监控、智慧交通、网络审核等应用中的关键技术,旨在检测视频中偏离预期的异常事件。然而,现有方法 将训练数据集中的固定类别作为隐式定义的 "异常",这种方法的泛化能力不足,无法适应开放世界中动态变化的异常定义需求。 下面是三个异常定义发生变化的例子: 这些例子表明, 异常性并非事件本身的固有属性,而是由场景、时间、用户需求等因素决定的动态概念。 针对这一问题, 中国传媒 大 学媒 体融 合与 传播国家重点实验室的吴晓雨教授团队 于 ICLR 2026 发表论文《Language-guided Open-world Video Anomaly Detection under Weak Supervision》,直面 VAD 领域的核心问题 —— 什么是异常? 论文标题:Language-guided Open-world Video Anomaly Detection under Weak Supervision 吸烟行为在加油站是异常行为,但是在吸烟区不是(随场景变化) 流感高发季不戴口罩是异常行为,其他时期则不是(随时间变化) 在某个机 ...