Workflow
机器之心
icon
Search documents
字节跳动李航博士新作:AI智能体的通用框架
机器之心· 2026-01-28 13:08
Core Viewpoint - The article discusses a general framework for AI agents proposed by Dr. Li Hang from ByteDance, which encompasses both software and hardware agents, emphasizing their task-oriented nature and reliance on large language models (LLMs) for reasoning and reinforcement learning for construction [3][4]. Group 1: Characteristics of AI Agents - AI agents are defined as "rational action machines" that interact with their environment, including humans, to achieve specific tasks with evaluative standards for success [6]. - They utilize text and multimodal data (including images, videos, and audio) as inputs and can produce text, multimodal data, or action data as outputs [7][8]. - The core of the AI agent framework is the LLM, which facilitates reasoning and decision-making, and the framework aligns with human brain information processing mechanisms [8][19]. Group 2: Framework Components - The proposed framework consists of multimodal large language models (MLLM), tools, memory (including long-term and working memory), multimodal encoders, decoders, and action decoders [11][12]. - Hardware agents (robots) require both MLLM and a multimodal-language-action model (MLAM) for high-level task planning and low-level action planning [12]. - The framework has a two-layer structure: the lower layer includes various components, while the upper layer manages overall information processing [12]. Group 3: Comparison with Human Brain - The framework of AI agents shows functional similarities to human brain information processing, exhibiting a dual-layer structure with serial and parallel processing capabilities [19]. - Both systems utilize symbolic and neural representations for information processing, indicating a shared approach in handling complex tasks [19][28]. Group 4: Future Research Directions - Key areas for future exploration include expanding data scale, enabling autonomous and continual learning, and enhancing safety and controllability of AI agents [30][31][32][34]. - The lack of sufficient training data is identified as a significant bottleneck, necessitating innovative data collection methods [31]. - The development of AI agents should focus on ensuring that reinforcement learning reward functions align with human values to mitigate risks [34].
比人类专家快2倍,斯坦福联合英伟达发布TTT-Discover:用「测试时强化学习」攻克科学难题
机器之心· 2026-01-28 04:59
机器之心编辑部 在技术如火如荼发展的当下,业界常常在思考一个问题:如何利用 AI 发现科学问题的新最优解? 一个普遍的解法是「测试时搜索」(Test-time search),即提示一个冻结的(不更新参数的)大语言模型(LLM)进行多次尝试,这一点类似人类在做编程作业时 的「猜」解法,尤其是进化搜索方法(如 AlphaEvolve),会将以往的尝试存入缓冲区,并通过人工设计、与领域相关的启发式规则生成新的提示。 可是,尽管这些提示能够帮助 LLM 改进以往的解法,但 LLM 本身并不会真正提升,就像一个学生始终无法内化作业背后的新思想一样。 实际上, 能够让 LLM 真正进步的最直接方式是学习。 尽管「学习」和「搜索」都能随着算力扩展而良好地增长,但在 AI 的发展历史中,对于围棋、蛋白质折叠等这类困难问题,「学习」往往最终超越了「搜索」。 因为, 科学发现本质是:超出训练数据与人类现有知识的 out-of-distribution 问题。 为此, 斯坦福大学、英伟达等机构联合提出一种新方法:在测试时进行强化学习(RL),即让 LLM 在尝试解决特定测试问题的过程中持续训练自己。 论文链接:https://w ...
AAAI 2026 Oral | SplatSSC:解耦深度引导的高斯泼溅,开启单目语义场景补全高效新范式
机器之心· 2026-01-28 04:59
Core Viewpoint - The article discusses the development of SplatSSC, a novel framework for Semantic Scene Completion (SSC) that addresses the limitations of traditional dense grid representations by utilizing a depth-guided approach and decoupled aggregation mechanism to enhance performance and efficiency [3][4]. Group 1: Challenges in Traditional Methods - Traditional dense grid representations in SSC have been limited by two main issues: low utilization rates of randomly initialized Gaussian primitives (approximately 3.9%) and the generation of erroneous semantic fragments known as "Floaters" due to isolated outliers [3][4]. - The existing methods often rely on large-scale random distributions of Gaussian primitives, leading to significant computational redundancy and wasted model capacity [6]. Group 2: SplatSSC Framework - SplatSSC introduces an innovative depth-guided strategy and a decoupled aggregation mechanism, resulting in a significant leap in performance and efficiency [4]. - The framework employs a parallel branch strategy, integrating a learnable image encoder for multi-scale semantic extraction and a pre-trained Depth-Anything model for stable depth features [10]. Group 3: Core Technologies - The Group-wise Multi-scale Fusion (GMF) module in SplatSSC replaces random initialization with precise guidance using geometric priors, requiring only 1,200 Gaussian primitives (about 7% of previous methods) to effectively cover spatial distributions [11][13]. - The Decoupled Gaussian Aggregator (DGA) is designed to combat the "Floaters" issue by decoupling occupancy probability from semantic contributions, ensuring clean scene boundaries [15][19]. Group 4: Experimental Validation - SplatSSC achieved state-of-the-art (SOTA) performance on the Occ-ScanNet dataset, with an Intersection over Union (IoU) score of 62.83% and a mean IoU (mIoU) of 51.83%, surpassing previous SOTA methods by 6.35% and 4.16% respectively [22][23]. - The model demonstrated superior fine-grained perception capabilities, particularly in recognizing intricate objects like chair legs and table surfaces [22]. Group 5: Efficiency and Resource Management - SplatSSC's design allows for a significant reduction in inference latency (approximately 9.3% to 115.63 ms) and memory consumption (approximately 9.6%), while maintaining a stable parameter scale with only a 0.19% increase [34]. - The framework's efficiency is highlighted by its ability to achieve high-quality scene reconstruction with fewer Gaussian primitives, demonstrating that the "quality" of primitives is more critical than their "quantity" [32][33].
万物皆可参考是种什么体验?Vidu Q2参考生Pro:特效、演技、细节全都要
机器之心· 2026-01-28 04:59
编辑|+0 最近,一段「威尔·史密斯吃意面」的今昔对比视频在社交媒体刷屏,引发了无数感慨。 两年前,初出茅庐的 AI 视频还是「抽象鬼畜」的代名词,五官乱飞、逻辑崩坏;仅仅两年过去,当同一主题再次被演绎,从吞咽时肌肉的牵动,到光影在 面部的细腻流转,AI 已进化至「惟妙惟肖」的真·智能水准。 这两年,浓缩了 AI 视频生成行业翻天覆地的技术跃迁。然而,行业并未止步于画质的内卷。在各家厂商竞逐「可控性」高地的当下,AI 视频正站在一个 关键转折点: 从解决「有没有」,到追求「精不精」 。 回顾 Vidu 的进化之路:2025 年 9 月,Vidu Q2 全球首发,以惊艳的图生视频、参考生视频能力技惊四座;12 月,Q2「生图全家桶」上线,首日突破 50 万次的使用量,印证了市场对高质量生成的渴望。 昨天,Vidu Q2 参考生 Pro 正式发布。 登陆 Vidu.cn 或 Vidu API: platform.vidu.cn ,体验最新产品功能。 短短数月,它完成了从「生成」到「编辑」的闭环,更推出了 全球首个「万物可参考」的视频模型 ,将参考模态从静态图像一举扩展至动态视频与多维元 素。其全新 Slogan「 ...
蚂蚁出手VLA,就是开源超越Pi0.5的基座模型
机器之心· 2026-01-28 03:36
编辑|张倩 一个机器人到底需要多「聪明」,你才愿意把它请进家门? 前段时间,明星具身智能公司 1X 开始预售其人形机器人 Neo。演示视频中,它能从冰箱取水、叠衣服、把餐具放进洗碗机,俨然一个称职的家务助手。 但问题是,它当时真正能自主完成的,也只有这几件事。至于更多样的日常任务 —— 比如整理散落的玩具、擦拭台面、收纳杂物 —— 在现阶段,大多仍需要工 程师远程教学。 这就多少有些令人迟疑:花费近 14 万元,迎来的不仅是一个「助手」,还可能是一双需要你授权进入家庭隐私空间的「眼睛」。社交网络上,不少人也对这种 「半成品智能」表达了困惑甚至调侃。 这种「演示场景自主、真实任务依赖人工」的割裂状态,恰恰映射出当前具身智能落地的核心挑战: 泛化 能力不足 。 要突破这一瓶颈,业界共识是:需要更大规模、更多样化的 真实机器人数据 来「喂养」模型,使其学习到更本质的任务理解与动作泛化能力。然而,高质量真机 数据的采集成本极高,且不同构型机器人的数据难以复用,导致大多数模型仍只能在有限数据或仿真环境中训练,难以实现真正的跨任务、跨本体泛化。 在这一背景下,蚂蚁灵波开源发布的 第一款具身智能基座模型 LingBot-V ...
刚刚,OpenAI发布「科研写作神器」Prism,Overleaf危
机器之心· 2026-01-28 03:36
Core Viewpoint - OpenAI has launched Prism, an AI-native collaboration platform designed specifically for scientists, powered by the advanced GPT-5.2 model, aiming to enhance writing and collaboration efficiency in scientific research [1][14]. Group 1: Platform Features - Prism is now available for free to all users with a ChatGPT personal account, with no limits on project numbers or collaboration participants [3]. - The platform integrates with Zotero, enhancing its functionality for researchers [4]. - Prism's interface is similar to Overleaf, which has raised concerns about competition in the scientific writing tool market [6]. - Users have reported that Prism's features effectively cover those of Overleaf, with some considering using it for their next articles [9]. Group 2: AI Integration and Functionality - As an AI-native workspace, Prism integrates initial draft writing, editing, team collaboration, and submission preparation into a unified cloud-based LaTeX environment [19]. - GPT-5.2 is embedded within the project core, allowing it to understand the structure of papers, mathematical formulas, references, and overall context [19]. - Prism is built on the existing cloud-based LaTeX platform Crixet, which OpenAI acquired and enhanced to create this unified product [19]. - Key functionalities include: - Engaging in deep conversations with GPT-5.2 to inspire ideas and validate scientific hypotheses [19]. - Writing and revising papers based on the full context, including text, formulas, references, and logical structure [19]. - Integrating relevant literature from platforms like arXiv into the current context [19]. - Handling formulas, citations, and figures across chapters, understanding their interconnections [19]. - Converting whiteboard formulas and figures into LaTeX, saving researchers significant time [19]. - Real-time collaboration with co-authors, students, and mentors, with all edits and comments synchronized [19]. - Directly modifying documents in situ without switching between editors and chat tools [19]. Group 3: Future Implications - The rapid evolution of AI technology is expected to bring significant transformations to the scientific field by 2026, with Prism being a step towards reducing the complexities of daily research tasks [14]. - There are discussions about whether research papers created using Prism should credit AI as a co-author, reflecting the growing role of AI in academic writing [20].
被Anthropic指控侵权,Clawdbot改名Moltbot
机器之心· 2026-01-28 00:38
编辑|Panda Clawdbot 火了 ,非常火那种;这一轮曝光后才短短不过几天时间,其 GitHub star 数就已经接近 7 万,真的可以说是「原地起飞」了。 但 AI 红了,是非也多。伴随爆红而来的并非只有赞誉,还有一系列令人措手不及的连锁反应。 一封律师函引发的「脱壳」 昨天下午,Clawdbot 已正式宣布更名为 Moltbot 。 这场更名的直接导火索是来自 AI 巨头 Anthropic 的律师函。Anthropic 指控其商标侵权,理由是「 Clawd bot 」与自家的「 Claude 」在拼写和读音上过于相似。对 于开发者 Peter Steinberger 而言,这次更名并非本意,而是迫于压力的无奈之举。 其在 X 上自嘲道:「Same lobster soul, new shell.」(同样的龙虾灵魂,换了一身新壳)。他选择「Molt」(蜕皮)一词,寓意龙虾为了成长必须经历的痛苦蜕壳过 程。 改名风波:抢注、报错与币圈骚扰 尽管 Moltbot 项目开发者 Peter Steinberger 试图平稳过渡,但改名过程却演变成了一场技术与舆论的灾难。 在重命名过程中,GitHub 平 ...
「熟悉的陌生人」才是「好老师」?复旦提出简单指标,找出推理蒸馏中真正有教学价值的数据
机器之心· 2026-01-28 00:38
什么样的思维链,能「教会」学生更好地推理? 许多人都有这样的学习体验:内容过于熟悉,难以带来新的收获;内容过于陌生,又往往超出理解能力,难以消化吸收。 类似的现象同样出现在大语言模型的推理蒸馏中。来自能力更强的教师模型的思维链,可能过于晦涩,学生模型难以掌握其推理模式;而与学生认知相近的教师 模型,其推理轨迹又常常缺乏新信息,难以带来实质提升。 因此,要获得理想的蒸馏效果,关键在于 为不同学生模型选择恰好合适的数据, 在「熟悉」与「陌生」之间找到最佳平衡。然而,现有基于概率的筛选或度量方 法(如 Perplexity)难以刻画这种细粒度的适配关系。 那么,是否存在一种直观且易于计算的数据适配度指标,能够量化这种平衡? 来自 复旦大学和上海人工智能实验室的研究者 提出了一种简单而有效的度量方法, Rank-Surprisal Ratio (RSR): RSR 从学生模型的视角出发,综合考虑样本的信息量与对齐程度,旨在找出那些 既足够「新」,又未超出学生认知边界 的推理数据。 在大规模蒸馏实验中,RSR 与学生模型后训练性能的相关性高达 0.86,并且可以直接用于筛选推理轨迹以及选择教师模型, 无需实际训练即可找 ...
刚刚,杨植麟亲自开源Kimi K2.5!国产大模型打架的一天
机器之心· 2026-01-27 09:45
编辑 | Panda、泽南 今天真是国产大模型打架的一天!昨晚千问上新模型,今天 DeepSeek 开源 OCR 2。 中午,Kimi 也开卷,网站、App、API 开放平台和编程助手产品 Kimi Code 模型版本全面更新,Kimi K2.5 来了。 月之暗面创始人杨植麟还首次出镜,向大家分享了新模型的能力。 Kimi K2.5 是一个拥有 1 万亿参数(1 trillion)的 MoE 基础模型。相较前代,K2.5 的视觉理解能力大幅增强(可以处理视频了),Coding 能力也有了 明显提升,更重要的是,K2.5 依然开源。 Kimi K2.5 在包括 HLE、BrowseComp 和 DeepSearchQA 等极具挑战性的 agent 评测上取得了当前最佳表现(SOTA),比如 HLE(人类最后考试) 上拿到 50.2%,BrowseComp 拿到了 74.9%。 同时,K2.5 的编程能力也非常突出,它在 SWE-bench Verified 上拿到了 76.8 %,缩小了与顶尖闭源模型之间的差距,K2.5 在多项视觉理解评测上也 实现了当前开源最佳效果。 可以看到,在核心基准测试上,Kimi K ...
ICLR 2026 放榜了!28%接收率,欢迎投稿机器之心
机器之心· 2026-01-27 09:45
作为机器学习领域的顶级会议, ICLR 2026 将于 2026 年 4 月 23 日至 27 日在巴西里约热内卢举行。官方今年收到了有效投稿约 19000 篇,总录取率约为 28%,该 录取率涵盖了所有经过同行评审的完整论文投稿,无论其是否撤稿。 网友晒出成绩单 录用通知一出来,网友们也坐不住了。社交平台上,很快被各种成绩单刷屏: | 11894 Optimal Sparsity of Mixture-of-Experts | 4 Official Reviews Submitted | ICLR 2026 Conference Submission | | --- | --- | --- | | Language Models for Reasoning Tasks | Reviewer fnio: Rating: 8 / Confidence: 4 | Recommendation: | | L Download PDF | Read Official Review | | | | Reviewer tKtK: Rating: 6 / Confidence: 3 | Accept | | Taishi ...