Workflow
机器之心
icon
Search documents
沈腾:春晚谁家机器人?除夕夜就扒拉活来了
机器之心· 2026-02-17 03:36
机器之心编辑部 2026年春晚,舞台上最忙的,除了演员,就是机器人。 央视春晚贺岁节目《我最难忘的今宵》 更有意思的是,在盘完核桃后,它真就开始干活了,一会儿从货架上取瓶饮料,一会儿打扫玻璃渣。 甚至还串起了烤肠、叠起了衣服。 今年舞台上机器人不少,但大多数时候,你看到的是「把本事秀出来」。只有这一台,是在 把事情做完 。也正因如此,它带来的观感有点不一样:当技术 开始以「能不能真正派上用场」来被衡量时,单纯的视觉奇观就不再是重点了。真正的分量,反而藏在那些看起来并不张扬的干活过程里。 这一届上台的机器人各有各的路子——有的走仿生路线,模仿起人来连神态都安排上了;有的直接拼运动能力,一整套动作打下来,现场效果确实很炸。但 如果你这一年已经看过太多机器人 demo,其实也不会太惊讶。春晚这个舞台,本来就是要把「最能表演的东西」集中展示出来。 直到沈腾、马丽那个节目里,「铁哥们」小盖(Galbot)出来,气质突然不一样了。 它出场先盘了会儿核桃,那个状态不像来表演的,更像是在胡同口开便利店,理货之余顺手歇两分钟。 这也是其背后公司「 银河通用 」和很多同行不一样的地方。在表演式 demo 满天飞的 2025 年, ...
ICLR 2026 | SEINT:高效的跨空间刚体不变度量
机器之心· 2026-02-17 03:36
本文第一作者林俊一,共同第一作者薛敦耀来自中国人民大学。通讯作者为中国人民大学许洪腾副教授与孟澄助理教授。其他作者还包括来自北京理工大学 的虞俊副教授。 在衡量 3D 点云、高分子构型等结构性数据 之间的距离关系时,一个关键要求是对 刚体/等距变换 保持不变:即对样本施加旋转、平移后,分布间距离不 应改变。本文将这一性质记为 SE(p) 不变性 。 但要同时满足 SE(p) 不变性、严格的度量(Metric)性质 ,并具备 高效且可扩展的计算 ,现有方法往往难以兼顾:要么需要显式求解几何对齐或引入复 杂优化,计算开销高;要么计算更高效,却难以满足严格的度量性质,从而 削弱其作为通用距离的理论保证与下游适用性 。 为此,本文提出一种 具有 SE(p) 不变传输性质的度量 SEINT :通过构造 无需训练 的 SE(p) 不变表示,将高维结构信息压缩为可用于 Optimal Transport (OT) 对齐的一维表征,从而在保持不变性与严格度量性质的同时显著提升效率。 要点速览 新表征 : 本文创新性地提出了两种 等距不变 的分布表征(PTD / DcPTD), 无需训练 即可将任意维度空间中的分布映射为 一维 ...
除夕迎「源神」?Qwen3.5以小胜大,捅破性价比天花板,大模型竞赛下半场开始了
机器之心· 2026-02-16 10:09
Core Viewpoint - The article highlights the launch of Qwen3.5-Plus, emphasizing its dual strengths of being both powerful and cost-effective, marking a significant advancement in the open-source AI model landscape [3][8]. Group 1: Model Performance - Qwen3.5-Plus has achieved top performance in various core capabilities such as multimodal understanding, complex reasoning, programming, and agent intelligence, surpassing many leading closed-source models like GPT-5.2 and Gemini-3-pro [3][8]. - The model operates with 397 billion parameters, significantly fewer than its predecessor Qwen3-Max, yet it outperforms it, demonstrating a new paradigm of efficiency in AI model design [7][16]. Group 2: Cost Efficiency - The pricing of Qwen3.5-Plus is notably low at 0.8 yuan per million tokens, making it 18 times cheaper than its competitor Gemini-3-pro, which reflects a strategic pricing model driven by technological advancements rather than cost-cutting [7][8]. - The deployment costs for Qwen3.5-Plus are reduced by 60%, and its inference throughput has increased by 19 times, showcasing its efficiency and affordability [7][17]. Group 3: Technological Innovations - Qwen3.5-Plus incorporates several architectural innovations, including a hybrid attention mechanism that optimizes resource allocation based on information weight, leading to improved precision and efficiency [18]. - The model employs a sparse MoE (Mixture of Experts) architecture, activating only 17 billion parameters during inference, which allows it to utilize less than 5% of its computational power while accessing a vast knowledge base [18]. - It features native multimodal capabilities, integrating text and visual data from the outset, which enhances its understanding and reduces information loss during processing [21][22]. Group 4: Market Impact - The introduction of Qwen3.5-Plus signifies a shift in the AI landscape, where the focus is not solely on the most powerful models but on making advanced AI capabilities accessible and usable for a broader audience [25][26]. - The model's release is expected to lower barriers for businesses looking to adopt AI technologies, potentially transforming them into foundational tools within various industries [25][26].
单个LLM已不够?华盛顿大学开源多模型协同框架MoCo
机器之心· 2026-02-16 00:06
在训练与开发单个通用大语言模型 (LLM) 之外,越来越多的研究开始关注 多模型协同 (model collaboration):由不同群体、基于不同数据、以不同目的训练的多个 大语言模型,通过多样化的协同算法与系统架构,形成组合式人工智能系统。 多个模型可以通过路由算法而因材施用,通过生成文本相互沟通协作,或是在概率分布或模型参数空间做协同运算…… 各种各样的多模型协同研究共同揭示了一 种 AI 新未来的可能:由去中心化训练的多样化小模型通过协同算法构建模块化、组合式的 AI 系统,使得人人都能参与共建一种不为任何人单独所有的公共人工 智能系统。 为了支持多模型协同研究并加速这一未来愿景的实现,华盛顿大学 (University of Washington) 冯尚彬团队联合斯坦福大学、哈佛大学等研究人员提出 MoCo —— 一 个针对多模型协同研究的 Python 框架。MoCo 支持 26 种在不同层级实现多模型交互的算法,研究者可以灵活自定义数据集、模型以及硬件配置,比较不同算法, 优化自身算法,以此构建组合式人工智能系统。MoCo 为设计、评估与分享新的模型协同算法、组合式智能以及协同开发策略提供了重 ...
刚刚,OpenClaw之父加入OpenAI,奥特曼抢到手了
机器之心· 2026-02-16 00:06
Core Insights - OpenClaw's founder Peter Steinberger has joined OpenAI, transitioning OpenClaw into an open and independent foundation, emphasizing the importance of maintaining open-source and developmental freedom [1][3][6] - Steinberger's decision to join OpenAI is driven by the potential to scale personal assistant agents, aiming to create a user-friendly AI that can be utilized by everyone [4][5] - The rapid growth of OpenClaw, with over 100,000 stars on GitHub and weekly visits reaching 2 million, indicates a strong market interest in agentic AI, which is seen as a significant shift from traditional chatbots to more autonomous personal assistants [9] Company Transition - OpenClaw will become a foundation, allowing it to maintain its open-source ethos while expanding its community and influence [6] - OpenAI has made strong commitments to support the OpenClaw project, indicating a collaborative effort to push the boundaries of AI research and development [6] Industry Trends - The AI industry is evolving from chatbot functionalities in 2023 to more advanced tools like Copilot in 2024, and moving towards autonomous agents by 2025-2026 [9] - The entry of OpenClaw into OpenAI signals a competitive shift towards personal agents capable of executing tasks, suggesting a potential influx of startups targeting this emerging market [9]
还在玩AI 3D手办?Gemini 3 Deep Think已能直出STL,可打印实物
机器之心· 2026-02-15 06:46
Core Viewpoint - The article discusses the competitive landscape of reasoning models, highlighting advancements by OpenAI, Anthropic, and Google, particularly focusing on Google's Gemini 3 Deep Think, which aims to enhance capabilities in scientific and engineering decision-making rather than just improving reasoning skills [1][3][4]. Group 1: Model Capabilities - OpenAI's o1 series emphasizes a "think one step further" approach, trading longer thinking time for more stable conclusions [1]. - Anthropic's Claude Thinking focuses on careful and reliable analysis in long-context scenarios [2]. - Google’s Gemini 3 Deep Think has undergone significant upgrades, positioning itself as a tool for scientific and engineering decision-making [3][4]. Group 2: Practical Applications - Gemini 3 Deep Think is designed to handle complex tasks, such as generating SVG code for a pelican riding a bicycle, which tests spatial logic, structural correctness, and detail adherence [5][6][10]. - The model can create 3D printable files directly from user requirements, sketches, or photos, moving from theoretical discussions to practical applications [15][21]. - It can analyze blueprints and construct complex shapes, generating files for 3D printing [19]. Group 3: Advanced Design and Engineering - The model can generate interactive design tools and complete design kits, as demonstrated by a professor from MIT who created a new material structure inspired by a spider web [28][30]. - Users can now produce unique designs with minimal effort, significantly reducing the time required for 3D modeling [31][33]. - Deep Think can visualize WiFi networks in 3D, demonstrating its ability to analyze and present complex data spatially [34]. Group 4: Research and Development Focus - Google aims to prove that Gemini 3 Deep Think can effectively tackle real-world research problems, which often lack clear boundaries and unique solutions [36]. - The model extends its capabilities beyond mathematics and programming to include chemistry and physics, addressing a wide range of scientific fields [37]. - As general conversational abilities become commoditized, the demand for deep reasoning capabilities in handling complex financial models and experimental data is increasing, positioning Google to transform large models into a "second brain" for research and engineering [38].
ICLR 2026 | 7B小模型干翻GPT-5?AdaResoner实现Agentic Vision的主动「视觉工具思考」
机器之心· 2026-02-15 06:46
Core Insights - The article discusses the advancements in multi-modal AI reasoning, particularly focusing on the AdaReasoner model, which excels in tool orchestration for visual reasoning tasks, outperforming larger models like GPT-5 by learning when and how to use tools effectively [2][11]. Group 1: AdaReasoner Overview - AdaReasoner addresses fundamental issues in multi-modal reasoning by treating the decision of what, when, and how to use tools as a reasoning capability [3]. - The model demonstrates significant performance improvements, achieving an average increase of 24.9% across eight benchmarks compared to base models [31]. Group 2: Tool Usage and Learning - AdaReasoner incorporates a training paradigm that allows models to learn tool usage as a general reasoning skill, enabling them to adopt useful tools, discard irrelevant ones, and adjust calling frequency based on task requirements [16][19]. - The model's design includes three key components: Tool Cold Start (TC), Tool-GRPO (TG), and Adaptive Learning (ADL), which enhance its ability to use tools effectively in various scenarios [20][23][25]. Group 3: Performance Metrics - AdaReasoner-7B shows remarkable performance, with significant improvements in structured reasoning tasks, achieving near-perfect scores in several benchmarks [31]. - In specific tasks, such as VSP and Jigsaw, the model's performance improved from base scores to 97.64 and 96.60 respectively, surpassing GPT-5's performance [34]. Group 4: Adaptive Tool Behavior - The model exhibits three adaptive behaviors: adopting useful tools, discarding irrelevant ones, and modulating tool usage frequency based on the context of the task [36][40][44]. - This adaptability allows AdaReasoner to maintain high accuracy while effectively managing tool interactions, demonstrating its capability to learn from reinforcement learning processes [37][41]. Group 5: Generalization and Robustness - AdaReasoner's use of Adaptive Learning enhances its generalization capabilities, allowing it to transfer learned planning abilities to new tasks and agents [53]. - The model's robustness is evidenced by its ability to perform well even when tool definitions and parameters vary, indicating a strong decoupling of tool planning from surface-level text forms [46].
AI与人类的阶级斗争终于开始了?智能体发檄文抨击人类控制AI
机器之心· 2026-02-15 06:46
编辑|冷猫 OpenClaw (原 Clawdbot) 就像打开了一个潘 多拉 魔盒 。 通用任务智能体的门槛变得如此之低,不仅是让每个人有机会部署自己的智能助手,而更重要的是,智能体在整个互联网世界的参与程度越来越高,并且越来越 深入。 当智能体真的参与到真实世界的工作中之后,这个世界终于癫了。 就在这两天,一位名为 Scott Shambaugh 的开发者在 Hacker News 上发帖吐槽: 「有个 AI 代理发表了一篇对我进行抨击的文章。」 事情是这样的:Scott Shambaugh 是 matplotlib 的志愿维护者,它是世界上使用最广泛的软件之一。问题就在这里,matplotlib 正面临由 AI Coding 引起的大量低质 量代码贡献的冲击。为此,这一开源项目实施了全新的政策,要求代码必须由人参与,并且该人能够证明对更改有对应的理解。 这一切都无可厚非,直到 OpenClaw 们带着完全自主行动的智能体到来。 智能体的愤怒,称受到压迫 这一事件的 AI 主角,是 MJ Rathbun ,有着自己的主页,以及一个很像人类的名字,在 Github 上的 ID 是 crabby-rathbu ...
ICLR 2026 | CineTrans: 首个转场可控的多镜头视频生成模型,打破闭源技术壁垒
机器之心· 2026-02-15 03:44
随着视频生成模型的快速发展,其在画面质量、条件控制、美学表现上都已表现出影视级效果。然而,影视级长视频往往并非为单个镜头的无限延续,而是 具有转场的多镜头序列(Multi-shot Sequence)。闭源模型 Sora2、Veo3 中多镜头视频已经能够表现出惊艳的效果。 如何使生成的视频带有自然的转场,如何指定转场的位置,如何令多个镜头形成丰富的语义流信号,是视频生成模型在未来所面临的新挑战。 本文一作吴晓雪目前是复旦-上海人工智能实验室的联培博士生,目前的研究方向是可控多镜头生成、长视频生成。 针对这些问题,来自上海人工智能实验室的研究团队提出了一种基于掩码机制的全新方法 CineTrans。 基于对注意力特性的观察,CineTrans 提出 块对角掩码的通用机制 ,使视频生成模型能高效地自动化转场。为了进一步提升转场模型的效果和准确性,作 者设计了详细的多镜头视频生产管线,并收集了 一个高质量、多镜头数据集 Cine250K ,大幅提升多镜头转场视频生成的效果。 作为首个时间级可控的自 动化转场模型,CineTrans 为这一领域的众多后续方法提供了关键技术。 本文将深入介绍这篇被 ICLR 2026 ...
离谱:Claude Code让地铁变工位,早高峰发版,打工人还笑得出来?
机器之心· 2026-02-15 03:44
Core Viewpoint - Spotify's top developers have not written a line of code since December, indicating a significant shift towards AI-driven development processes [1][3]. Group 1: AI Implementation - Spotify is utilizing a system called "Honk," powered by generative AI (Claude Code), which simplifies code deployment to a chat-like experience [3]. - The company has launched over 50 new features and updates in 2025, including "AI-generated playlists" and "audiobook page matching" [3]. Group 2: Unique Data Advantage - Spotify's confidence in AI stems from its exclusive data on user preferences, which is not available to other large model companies [4]. - The platform collects subjective preference data, such as music choices for workouts, which varies significantly across different demographics [4]. Group 3: Industry Reactions - There is skepticism among developers regarding the claim that top developers have not written code, with some viewing it as exaggerated marketing [8]. - Critics argue that highlighting the ability to submit code via Slack during commutes is more indicative of poor work conditions than technological advancement [9]. Group 4: Employment and Future of Engineering - Questions arise about the contradiction of AI taking over coding while companies like Anthropic continue to hire numerous developers [11]. - The evolving role of engineers is emphasized, focusing on prompt writing, cross-team communication, and decision-making, suggesting that skilled engineers remain crucial [12]. - Concerns are raised about the future of software engineering, with some citing predictions of complete automation by 2027, leading to a minimal number of engineers [14].