Workflow
机器之心
icon
Search documents
蚂蚁出手VLA,就是开源超越Pi0.5的基座模型
机器之心· 2026-01-28 03:36
编辑|张倩 一个机器人到底需要多「聪明」,你才愿意把它请进家门? 前段时间,明星具身智能公司 1X 开始预售其人形机器人 Neo。演示视频中,它能从冰箱取水、叠衣服、把餐具放进洗碗机,俨然一个称职的家务助手。 但问题是,它当时真正能自主完成的,也只有这几件事。至于更多样的日常任务 —— 比如整理散落的玩具、擦拭台面、收纳杂物 —— 在现阶段,大多仍需要工 程师远程教学。 这就多少有些令人迟疑:花费近 14 万元,迎来的不仅是一个「助手」,还可能是一双需要你授权进入家庭隐私空间的「眼睛」。社交网络上,不少人也对这种 「半成品智能」表达了困惑甚至调侃。 这种「演示场景自主、真实任务依赖人工」的割裂状态,恰恰映射出当前具身智能落地的核心挑战: 泛化 能力不足 。 要突破这一瓶颈,业界共识是:需要更大规模、更多样化的 真实机器人数据 来「喂养」模型,使其学习到更本质的任务理解与动作泛化能力。然而,高质量真机 数据的采集成本极高,且不同构型机器人的数据难以复用,导致大多数模型仍只能在有限数据或仿真环境中训练,难以实现真正的跨任务、跨本体泛化。 在这一背景下,蚂蚁灵波开源发布的 第一款具身智能基座模型 LingBot-V ...
刚刚,OpenAI发布「科研写作神器」Prism,Overleaf危
机器之心· 2026-01-28 03:36
Core Viewpoint - OpenAI has launched Prism, an AI-native collaboration platform designed specifically for scientists, powered by the advanced GPT-5.2 model, aiming to enhance writing and collaboration efficiency in scientific research [1][14]. Group 1: Platform Features - Prism is now available for free to all users with a ChatGPT personal account, with no limits on project numbers or collaboration participants [3]. - The platform integrates with Zotero, enhancing its functionality for researchers [4]. - Prism's interface is similar to Overleaf, which has raised concerns about competition in the scientific writing tool market [6]. - Users have reported that Prism's features effectively cover those of Overleaf, with some considering using it for their next articles [9]. Group 2: AI Integration and Functionality - As an AI-native workspace, Prism integrates initial draft writing, editing, team collaboration, and submission preparation into a unified cloud-based LaTeX environment [19]. - GPT-5.2 is embedded within the project core, allowing it to understand the structure of papers, mathematical formulas, references, and overall context [19]. - Prism is built on the existing cloud-based LaTeX platform Crixet, which OpenAI acquired and enhanced to create this unified product [19]. - Key functionalities include: - Engaging in deep conversations with GPT-5.2 to inspire ideas and validate scientific hypotheses [19]. - Writing and revising papers based on the full context, including text, formulas, references, and logical structure [19]. - Integrating relevant literature from platforms like arXiv into the current context [19]. - Handling formulas, citations, and figures across chapters, understanding their interconnections [19]. - Converting whiteboard formulas and figures into LaTeX, saving researchers significant time [19]. - Real-time collaboration with co-authors, students, and mentors, with all edits and comments synchronized [19]. - Directly modifying documents in situ without switching between editors and chat tools [19]. Group 3: Future Implications - The rapid evolution of AI technology is expected to bring significant transformations to the scientific field by 2026, with Prism being a step towards reducing the complexities of daily research tasks [14]. - There are discussions about whether research papers created using Prism should credit AI as a co-author, reflecting the growing role of AI in academic writing [20].
被Anthropic指控侵权,Clawdbot改名Moltbot
机器之心· 2026-01-28 00:38
编辑|Panda Clawdbot 火了 ,非常火那种;这一轮曝光后才短短不过几天时间,其 GitHub star 数就已经接近 7 万,真的可以说是「原地起飞」了。 但 AI 红了,是非也多。伴随爆红而来的并非只有赞誉,还有一系列令人措手不及的连锁反应。 一封律师函引发的「脱壳」 昨天下午,Clawdbot 已正式宣布更名为 Moltbot 。 这场更名的直接导火索是来自 AI 巨头 Anthropic 的律师函。Anthropic 指控其商标侵权,理由是「 Clawd bot 」与自家的「 Claude 」在拼写和读音上过于相似。对 于开发者 Peter Steinberger 而言,这次更名并非本意,而是迫于压力的无奈之举。 其在 X 上自嘲道:「Same lobster soul, new shell.」(同样的龙虾灵魂,换了一身新壳)。他选择「Molt」(蜕皮)一词,寓意龙虾为了成长必须经历的痛苦蜕壳过 程。 改名风波:抢注、报错与币圈骚扰 尽管 Moltbot 项目开发者 Peter Steinberger 试图平稳过渡,但改名过程却演变成了一场技术与舆论的灾难。 在重命名过程中,GitHub 平 ...
「熟悉的陌生人」才是「好老师」?复旦提出简单指标,找出推理蒸馏中真正有教学价值的数据
机器之心· 2026-01-28 00:38
什么样的思维链,能「教会」学生更好地推理? 许多人都有这样的学习体验:内容过于熟悉,难以带来新的收获;内容过于陌生,又往往超出理解能力,难以消化吸收。 类似的现象同样出现在大语言模型的推理蒸馏中。来自能力更强的教师模型的思维链,可能过于晦涩,学生模型难以掌握其推理模式;而与学生认知相近的教师 模型,其推理轨迹又常常缺乏新信息,难以带来实质提升。 因此,要获得理想的蒸馏效果,关键在于 为不同学生模型选择恰好合适的数据, 在「熟悉」与「陌生」之间找到最佳平衡。然而,现有基于概率的筛选或度量方 法(如 Perplexity)难以刻画这种细粒度的适配关系。 那么,是否存在一种直观且易于计算的数据适配度指标,能够量化这种平衡? 来自 复旦大学和上海人工智能实验室的研究者 提出了一种简单而有效的度量方法, Rank-Surprisal Ratio (RSR): RSR 从学生模型的视角出发,综合考虑样本的信息量与对齐程度,旨在找出那些 既足够「新」,又未超出学生认知边界 的推理数据。 在大规模蒸馏实验中,RSR 与学生模型后训练性能的相关性高达 0.86,并且可以直接用于筛选推理轨迹以及选择教师模型, 无需实际训练即可找 ...
刚刚,杨植麟亲自开源Kimi K2.5!国产大模型打架的一天
机器之心· 2026-01-27 09:45
Core Viewpoint - The article discusses the launch of Kimi K2.5, a new model that significantly enhances visual understanding and coding capabilities, positioning itself as a leading open-source model in the AI landscape [4][65]. Group 1: Model Capabilities - Kimi K2.5 features a foundation model with 1 trillion parameters, showing substantial improvements in visual understanding and coding abilities compared to its predecessor [4]. - The model achieved state-of-the-art (SOTA) performance in challenging assessments, such as 50.2% in HLE and 74.9% in BrowseComp [4]. - Kimi K2.5's programming capabilities are notable, scoring 76.8% on SWE-bench Verified, narrowing the gap with top proprietary models [4][6]. Group 2: Cost Efficiency - Kimi K2.5 operates at a fraction of the cost of GPT-5.2-xhigh while outperforming it in several assessments [7]. Group 3: Unified Model Features - Kimi K2.5 is an all-in-one model that integrates visual, text, and coding capabilities, allowing users to generate code from design sketches without needing to write code or use prompt engineering [12]. - The model can interpret images and videos to produce code, enhancing user experience in design modifications [13][14]. Group 4: Agent Swarm Functionality - Kimi K2.5 introduces the "Agent Swarm" feature, enabling it to coordinate up to 100 agents to work in parallel, significantly speeding up task completion [21]. - This parallel processing capability can reduce tasks that typically take days to mere minutes [25]. Group 5: Real-World Applications - Kimi K2.5 can handle complex office tasks, including document editing and financial modeling, with the ability to produce extensive outputs, such as 10,000-word papers or 100-page documents [29]. - The model's agent capabilities allow for sophisticated task management, such as creating a new language with consistent linguistic structures [51]. Group 6: Development and Future Outlook - The release of Kimi K2.5 sets a new benchmark for open-source models globally, indicating a shift in the standards of AI development [65]. - The advancements in visual and agent capabilities suggest that AI is moving closer to achieving artificial general intelligence (AGI) [67].
ICLR 2026 放榜了!28%接收率,欢迎投稿机器之心
机器之心· 2026-01-27 09:45
作为机器学习领域的顶级会议, ICLR 2026 将于 2026 年 4 月 23 日至 27 日在巴西里约热内卢举行。官方今年收到了有效投稿约 19000 篇,总录取率约为 28%,该 录取率涵盖了所有经过同行评审的完整论文投稿,无论其是否撤稿。 网友晒出成绩单 录用通知一出来,网友们也坐不住了。社交平台上,很快被各种成绩单刷屏: | 11894 Optimal Sparsity of Mixture-of-Experts | 4 Official Reviews Submitted | ICLR 2026 Conference Submission | | --- | --- | --- | | Language Models for Reasoning Tasks | Reviewer fnio: Rating: 8 / Confidence: 4 | Recommendation: | | L Download PDF | Read Official Review | | | | Reviewer tKtK: Rating: 6 / Confidence: 3 | Accept | | Taishi ...
刚刚,DeepSeek又探索新架构了,开源OCR 2
机器之心· 2026-01-27 06:00
机器之心编辑部 嘿!刚刚,DeepSeek 又更新了! 这次是更新了十月份推出的 DeepSeek-OCR 模型(参见: 太强了!DeepSeek 刚刚开源新模型,用视觉方式压缩一切 )。 当时 DeepSeek-OCR 的出世,引起了大家对视觉压缩的关注与讨论,而这一次,DeepSeek 对视觉编码下手了。 可以说,刚刚发布的 DeepSeek-OCR 2 通过引入 DeepEncoder V2 架构,实现了视觉编码从「固定扫描」向「语义推理」的范式转变! 当然,和 DeepSeek 几乎每次发布一样,这一次同样也是模型和技术报告齐开源。 这种设计打破了传统模型必须按从左到右、从上到下的栅格顺序处理图像的限制,赋予了编码器根据图像语义动态重排视觉 Token 的能力。通过这种两级级联的 1D 因果推理结构(编码器重排与译码器解析),模型能够更精准地还原复杂文档(如带表格、公式和多栏布局)的自然阅读逻辑。 这就像是为机器装上了「人类的阅读逻辑」,让 AI 不再只是生搬硬套地扫描图像。对比之下,传统的 AI 就像一个死板的复印机,不管页面内容多复杂,都只能 从左上角到右下角按行扫描。 在维持极高数据压缩效率的同 ...
高效智能体的「幕后推手」是谁?一篇综述带你从记忆×工具学习×规划看透
机器之心· 2026-01-27 06:00
Core Insights - The article emphasizes the shift in focus within the industry from "can the model do it" to "can the agent be deployed" as large model capabilities advance [2] - It highlights the importance of efficiency in deploying intelligent agents, as high performance at a high cost is not sustainable for large-scale production [2] Group 1: Memory of Intelligent Agents - Efficient memory systems are crucial for intelligent agents to handle long tasks without overwhelming token usage and degrading performance [6] - The paper outlines a three-step memory lifecycle: construction, management, and access, focusing on transforming long dialogues into usable memory while balancing cost and fidelity [7] - The trend of multi-agent memory is emerging, categorized into shared memory, local memory, and hybrid memory [8] Group 2: Tool Learning - Tools enable agents to transition from "speaking" to "doing," but costs can escalate quickly in the toolchain [9] - The paper identifies three main strategies for improving efficiency: tool selection, tool invocation, and tool fusion reasoning [11] Group 3: Planning of Intelligent Agents - Planning determines how agents act in multi-step decision spaces, with efficiency issues arising from either single-agent reasoning or multi-agent collaboration [15] - The paper discusses memory management strategies to prevent "memory explosion" and ensure efficient retrieval [12] - It emphasizes the need for effective tool selection and invocation to minimize delays and unnecessary calls [13] Group 4: Benchmarking and Evaluation - Establishing a clear benchmark for efficiency is essential, as efficiency must be built on effectiveness [16] - The paper reviews existing benchmarks that incorporate efficiency signals and summarizes common efficiency metrics used in agent methodologies [17] Group 5: Challenges and Future Directions - The paper outlines several challenges, including the need for a unified evaluation framework and the exploration of latent reasoning in intelligent agents [19] - It highlights the importance of considering deployment costs in multi-agent scenarios and the need for efficiency research in multi-modal agents [20] - Strategies for single-agent efficiency focus on adaptive budgeting and structured search, while multi-agent strategies aim to reduce communication costs without losing information [21]
性能比肩Gemini 3 Pro!昨晚,阿里千问最强模型来了
机器之心· 2026-01-27 04:59
Core Viewpoint - The launch of Alibaba's Qwen3-Max-Thinking model marks a significant advancement in AI capabilities, positioning it among the top domestic models comparable to international leaders like GPT-5.2 and Gemini 3 Pro [1][5]. Performance Evaluation - Qwen3-Max-Thinking has achieved impressive scores across various benchmarks, including: - MMLU-Pro: 85.7 - MMLU-Redux: 92.8 - C-Eval: 93.7 - GPQA: 87.4 - LiveCodeBench v6: 85.9 - IMOAnswerBench: 83.9 - Overall, it has surpassed previous records in 19 mainstream evaluation benchmarks [4][5]. Model Specifications - The model boasts over 1 trillion parameters and has been trained on 36 trillion tokens, making it Alibaba's largest and most powerful reasoning model to date [4][5]. Innovative Features - Qwen3-Max-Thinking introduces a Heavy Mode for reasoning, allowing for iterative self-reflection and experience accumulation, which enhances problem-solving efficiency without significantly increasing token costs [13]. - The model integrates tool usage into the reasoning process, enabling it to perform complex tasks in a more strategic manner, thus reducing errors and improving real-world applicability [14]. Market Impact - As of January 2026, the Qwen series has achieved over 1 billion downloads on Hugging Face, establishing itself as one of the most popular open-source AI model series [15]. - The introduction of Qwen3-Max-Thinking signifies a shift in the AI market focus from merely intelligent chatbots to powerful intelligent agents capable of executing complex tasks [15].
蚂蚁具身研究首次亮相!就解决了机器人「看」透明玻璃这些难题,还开源了
机器之心· 2026-01-27 04:59
编辑|冷猫 众所周知,「具身智能」是连接数字世界和现实世界的桥梁。 真正的「具身智能」,是全面自主决策自主行动的通用机器人,需要建立在对物理世界完全理解的基础上。 空间视觉感知是自动驾驶、机器人操作等真实世界应用的底层能力,核心目标只有一个: 让机器能够理解并参与三维环境中的交互 。 这类机器人大多都以 RGB-D 相机获取真实世界视觉和深度信息,这是行业内综合了成本,精度,以及实用性后普遍的选择。 但物理世界是极为复杂的,要想让这些自主执行任务的机器人卡壳,只需要简单的一块玻璃。 家务机器人撞玻璃的翻车场面 对机器来说,玻璃几乎是世界里的幻影。人类会下意识地把反射、折射进行判断,但机器人并没有这种生活经验。玻璃这类又透明又反光的物体,恰好屏蔽了 RGB-D 相机获取的全部特征,深度和像素点都很难准确识别。 随着自动驾驶和智能机器人离我们的生活越来越近,这个现象已经逐渐成为一个亟需解决的痛点。 令人欣喜的是,我们发现刚刚开源的 全新具身智能感知模 型 Ling Bot-Depth ,非常针对性的解决了机器人识别真实世界的「玻璃问题」。 LingBot-Depth 是蚂 蚁灵波科技开源 的 高精度空间 感知 模 ...