Workflow
机器之心
icon
Search documents
效率提升25%,灵巧操作数采困境被「臂-手共享自主框架」解决
机器之心· 2025-12-11 10:00
实现通用机器人的类人灵巧操作能力,是机器人学领域长期以来的核心挑战之一。近年来,视觉 - 语言 - 动作 (Vision-Language-Action,VLA) 模型在机器人技能学 习方面展现出显著潜力,但其发展受制于一个根本性瓶颈: 高质量操作数据的获取。 ByteDance Seed 团队最新的研究论文《End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy》[1],针对这一关键问题提出了解决方案。 该研究的核心贡献在于提出了共享自主 (Shared Autonomy) 框架,通过合理划分人类操作员与自主 AI 系统的控制职责——人通过 VR 遥操作控制机械臂 (负责高层 定位和避障),DexGrasp-VLA 自主控制灵巧手 (负责精细抓握),消除了同时遥操作臂和灵巧手的需求,大幅降低操作员认知负荷,有效解决了机器人部署中最关 键的数据采集成本问题。通过将数据采集效率提升至可规模化的水平,它为灵巧操作技术从实验室走向工业应用奠定了基础。 Data collection and training pipeline for DexGra ...
大模型的第一性原理:(一)统计物理篇
机器之心· 2025-12-11 10:00
机器之心发布 作者: 白铂 博士 白铂 博士,华为 2012 实验室理论研究部主任 信息论首席科学家 2022 年底,ChatGPT 横空出世,其能力震惊了整个世界。2024 年底,DeepSeek 以极低的训练成本和极高的性能再次震惊了世界。短短几年间,大模型疯狂迭代, 能力不断提升,仅在美国,AI 领域的投资规模便超过了许多国家全年的 GDP!2025 年底,Google 强势推出 Gemini 3,模型能力突飞猛进,TPU 训练范式也对英 伟达的生态发起了颠覆式挑战。 业界普遍认为 Gemini 3 是迈向通用人工智能(Artificial General Intelligence,AGI) 和超级人工智能(ASI,Artificial Super Intelligence,ASI)的关键突破,是人类 和机器合作的惊人之作。然而,正如 Ilya Sutskever 于 11 月 26 日的访谈中指出:大模型 Scaling Law 和摩尔定律一样,迟早会因为物理限制而失效。因此,如何打 开大模型训练的炼丹炉,看清黑盒子背后的基本原理,回答大模型是否已逼近其能力极限就成为迫在眉睫的问题了。但是,前人对大模 ...
MIT最新发现:这十年,算法进步被高估了
机器之心· 2025-12-11 02:47
Core Insights - The article discusses the significant advancements in AI driven by increased computational budgets and algorithmic innovations over the past decade [2][6] - It highlights that while computational growth is measurable, the quantification of algorithmic progress remains unclear, particularly regarding the efficiency improvements and their scalability [2][3] Group 1: Algorithmic Progress - Research estimates that algorithmic advancements have contributed over 4 orders of magnitude in effective compute over the past decade, while computational scale itself has increased by 7 orders of magnitude [2] - The overall efficiency of models has improved by approximately 22,000 times due to algorithmic innovations, allowing for similar performance with significantly fewer floating-point operations (FLOPs) [3][4] - Most algorithmic innovations yield only minor efficiency improvements, with less than 10 times overall efficiency gain when extrapolated to 2025's computational limits [4][11] Group 2: Scale-Dependent Innovations - Two major scale-dependent algorithmic innovations, from LSTM to Transformer and from Kaplan to Chinchilla, account for 91% of the total efficiency improvements [4][22] - The efficiency gains from algorithmic improvements are significantly larger in large-scale models compared to small-scale models, indicating that algorithmic progress is heavily reliant on computational scale [6][25] - The article suggests that the perceived rapid progress in algorithms may be more a reflection of increasing computational budgets rather than continuous algorithmic breakthroughs [22][24] Group 3: Experimental Findings - The study employed various methods, including ablation studies and scaling experiments, to analyze the impact of individual algorithms and their combinations [5][8] - The findings reveal a highly skewed distribution of efficiency improvements, with a few key innovations contributing disproportionately to overall gains [11][12] - The scaling experiments demonstrate that improvements in neural network architectures are not scale-invariant but exhibit increasing returns to scale [20][21]
被拒≠失败!这些高影响力论文都被顶会拒收过
机器之心· 2025-12-11 02:47
Core Insights - Waymo has released a deep blog detailing its AI strategy centered around its foundational model, emphasizing the use of distillation methods to create efficient models for onboard operations [1] - Jeff Dean highlighted the significance of knowledge distillation in AI, reflecting on its initial rejection by NeurIPS 2014, which underestimated its potential impact [3][4] Group 1: Historical Context of Rejected Papers - Many foundational technologies in AI, such as optimizers for large models and computer vision techniques, were initially rejected by top conferences, showcasing a systemic lag in recognizing groundbreaking innovations [6] - Notable figures in AI, including Geoffrey Hinton and Yann LeCun, faced rejection for their pioneering work, often due to reasons that seem absurd in hindsight, such as claims of lacking theoretical basis or being overly simplistic [6] Group 2: Specific Case Studies of Rejected Innovations - LSTM, a milestone in handling sequential data, was rejected by NIPS in 1996 during a period when statistical methods were favored, only to later dominate fields like speech recognition [8] - The SIFT algorithm, which ruled the computer vision domain for 15 years, faced rejection from ICCV and CVPR due to its perceived complexity and lack of elegance, ultimately proving the value of robust engineering design [11] - Dropout, a key regularization method for deep neural networks, was rejected by NIPS in 2012 for being too radical, yet it became crucial for the success of models like AlexNet [17] - Word2Vec, despite its revolutionary impact on NLP, received a strong rejection at ICLR 2013 due to perceived lack of scientific rigor, but it quickly became a cornerstone of text representation [19][20] Group 3: Reflection on Peer Review Limitations - The peer review system often struggles to recognize disruptive innovations, leading to a "simplicity trap" where reviewers equate mathematical complexity with research contribution [40] - Reviewers tend to maintain existing paradigms, which can hinder the acceptance of novel ideas that challenge traditional metrics of success [40] - The demand for rigorous theoretical proof in an experimental field like deep learning can stifle practical breakthroughs, as seen with the initial skepticism towards methods like Adam optimizer [40] Group 4: Broader Implications - The experiences of rejected papers illustrate the nonlinear nature of scientific progress, highlighting that peer review, while essential, is limited by human cognitive biases [41] - Historical anecdotes, such as Einstein's rejection of a paper on gravitational waves, emphasize that the true measure of a research's impact is its long-term relevance rather than immediate acceptance [42][44]
全球首个!灵巧手真实世界具身数采引擎Psi-SynEngine来了,灵初智能发布
机器之心· 2025-12-11 00:43
Core Insights - The article highlights the launch of Psi-SynEngine, the world's first embodied native human data collection solution developed by the company, which aims to create a leading general-purpose operational intelligence system [3][10] - The Psi-SynEngine addresses the data collection challenges in the embodied intelligence field by directly capturing operational data from frontline workers in real-world scenarios, rather than relying on high-cost, low-fidelity setups [4][6] Data Collection Advantages - Psi-SynEngine offers three main advantages over traditional data collection methods: low cost, high multi-modal data capture capabilities, and strong portability, enabling large-scale parallel data collection [5][7] - The portable data collection devices significantly reduce deployment costs, with data acquisition costs being only 10% of remote operation solutions [7] Data Engine and Dataset Features - The Psi-SynNet-v0 dataset, released alongside Psi-SynEngine, features strong data diversity, comprehensive modality coverage, massive data scale, and a validated closed-loop data system, enhancing model transfer and generalization capabilities [9][12] - The dataset aims to bridge the gap between human and robotic operations, addressing the structural and capability differences between human hands and robotic manipulators [9] Future Prospects - The establishment of Psi-SynEngine and Psi-SynNet-v0 marks a new paradigm for embodied AI, with plans to scale the dataset to over one million hours, positioning it as the largest dataset for dexterous operations globally [10][12] - The company invites global research institutions and partners to collaborate in building the Psi-SynNet dataset, aiming to usher in a new era of general intelligence [10]
扩散语言模型推理太慢?北大团队提出ODB-dLLM框架,破解计算访存双重瓶颈
机器之心· 2025-12-11 00:43
针对这一缺陷,来自北大的研究团队提出一种新的 dLLM 推理加速框架 ODB-dLLM(Orchestrating Dual- Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models)。它通过 分析现有 dLLM 推理框架中交错的计算和访存瓶颈阶段,引入了自适应长度预测策略和跳跃共享推测解 码,以优化 dLLM 在硬件平台上的计算访存特性,最大限度地提高推理效率。 本研究由北京大学研究团队完成。通讯作者为李萌,北京大学人工智能研究院和集成电路学院助理教授, 博导,PKU SEC Lab 负责人,他的研究兴趣集中于高效、安全人工智能加速算法和芯片,旨在通过算法到 芯片的跨层次协同设计和优化,为人工智能构建高能效、高可靠、高安全的算力基础。第一作者韦临烨, 北京大学集成电路学院博士一年级在读,主要研究方向为多模态高效 AI 系统和加速器设计。 基于扩散的大语言模型 (dLLM) 凭借全局解码和双向注意力机制解锁了原生的并行解码和受控生成的潜力, 最近吸引了广泛的关注。例如 F ...
微软发布首个测试时扩展大规模研究,还给出了终极指南
机器之心· 2025-12-10 10:30
机器之心报道 编辑:Panda 如果说大模型的预训练(Pre-training)是一场拼算力、拼数据的「军备竞赛」,那么 测试时扩展(Test-time scaling, TTS) 更像是一场在推理阶段进行的「即时战 略游戏」。 现在的共识是:让模型在回答问题前「多想一会儿」,往往能得到更好的结果。这听起来像是一个完美的免费午餐:只要能在推理时动态分配更多计算资源,就 能让模型的智商原地起飞。 但问题来了:我们该怎么让 LLM「多想」? 好比让一群学生做题:是让一个学生反复修改答案(序列策略)?还是让一百个学生同时做题然后投票(并行策略)?亦或是让他们开个会讨论一下(混合策 略)? 更重要的是,有些「学生」(模型)虽然聪明,但想得越多反而越容易钻牛角尖;而另一些则必须深思熟虑才能解出难题。 究竟哪个 TTS 策略才是那个「天选之子」? 为了结束这场盲人摸象般的争论,微软终于出手了。 论文标题:The Art of Scaling Test-Time Compute for Large Language Models 论文地址:https://arxiv.org/abs/2512.02008 这项研究不仅打破了 ...
GPT-5.2真身是它?OpenAI紧急端出全套「下午茶」,新一代图像模型同步泄露
机器之心· 2025-12-10 10:30
机器之心报道 编辑:+0 让我们细数一下这两年令人印象深刻的模型代号:草莓、香蕉、胡萝卜……每次一亮相,总要和水果蔬菜沾点关系。 这两天网友又发现了一个新的模型代号,这次不卖水果蔬菜,改开下午茶了。 最新的主角叫: Olive Oil Cake(橄榄油蛋糕) ,另加 Chestnut and Hazelnut(栗子和榛子) 。 种种迹象表明,这顿「下午茶」是 OpenAI 在内部「红色代码(Code Red)」警戒下,为了阻击谷歌 Gemini 3 而端出的全套反击大餐。 首先是这款神秘的主菜。 有眼尖的用户在 Notion 上发现了一个带有 OpenAI Logo 的新模型选项,内部代号赫然写着:Olive Oil Cake。 虽然目前普通用户还无法选中这个模型,但这唯一的标识码显然与现有的 GPT-5.1 不同。业内普遍猜测,这块听起来口感绵密、还带点地中海风情的「蛋 糕」,极有可能就是传说中的 GPT-5.2。 为什么说是「Code Red」级别的紧急发布? 无论最终命名如何,这两款模型在 社区讨论中 都被视为 Image-1 的直接继任者,目标非常明确:解决痛点,对标谷歌。根据测试者的反馈, 这款新模 ...
LLM距离AGI只差一层:斯坦福研究颠覆「模式匹配」观点
机器之心· 2025-12-10 10:30
机器之心报道 编辑:杨文、泽南 有关大语言模型的理论基础,可能要出现一些改变了。 斯坦福发了篇论文,彻底颠覆了「LLM 只是模式匹配器」的传统论调。 它提出的不是扩展技巧或新架构,而是一个让模型真正具备推理能力的「协调层」。 核心观点:AGI 的瓶颈在于协调,而非规模 人工智能界正因围绕大语言模型本质的争论而分裂。一方面,扩展派认为 LLMs 足以实现 AGI;另一方 面,有影响力的批评者认为 LLM「仅仅是模式匹配器」,在结构上不具备推理、规划或组合泛化能力,因 此是死胡同。 作者认为这场争论建立在一个错误的二分法之上,并提出一个颠覆性极强的核心观点: LLM 的失败不是因 为缺乏推理能力,而是因为我们缺少将其模式与目标绑定的系统。 为了解释这一点,作者用了一个捕鱼隐喻。 海洋代表模型庞大的模式库,渔夫不用鱼饵就撒网,收获的只是最常见的鱼类(训练数据中的通用模 式)。批评者谴责这些未锚定的输出,但他们观察到的只是未加诱饵的捕捞所产生的原始统计基线,这不 是系统损坏,而是系统在默认模式下的自然表现。 然而,智能行为不仅仅是撒网,它还涉及下饵和过滤。如果诱饵过于稀疏,它就无法吸引特定、稀有的 鱼,海洋的先验仍然 ...
「豆包手机」为何能靠超级Agent火遍全网,我们听听AI学者们怎么说
机器之心· 2025-12-10 08:13
Core Viewpoint - The article discusses the emergence of the Doubao mobile assistant, which integrates AI capabilities deeply into the smartphone operating system, transforming the way users interact with their devices and enabling complex task execution across multiple applications [3][12][26]. Group 1: Doubao Mobile Assistant Overview - The Doubao mobile assistant is currently in a technical preview phase and represents a significant advancement in AI integration within smartphones, functioning as a "super butler" rather than a standalone app [3][6]. - It allows users to execute complex commands across different apps with simple voice instructions, showcasing a new level of AI interaction [3][12]. - The assistant can perform multi-step tasks seamlessly, such as marking restaurants on a map, finding museums, and booking tickets on travel platforms [5][12]. Group 2: Challenges in Implementing System-Level AI Agents - Implementing system-level AI agents like Doubao involves overcoming four main challenges: perception, planning, decision-making, and system-level integration [9][10]. - The perception layer requires the agent to recognize all interactive elements on the screen quickly and accurately, even amidst dynamic distractions [9]. - The planning layer involves managing information flow across apps, maintaining logical continuity, and adapting to unexpected interruptions [10]. - The decision-making layer necessitates the agent's ability to generalize across different interfaces and execute various user interactions beyond simple clicks [10]. Group 3: Technical Innovations Behind Doubao - Doubao leverages a system-level integration approach, gaining Android system-level permissions while ensuring user privacy through strict authorization protocols [12][13]. - The assistant utilizes a visual multi-modal capability to understand screen content and user intent, allowing it to autonomously decide the next actions [12][13]. - The underlying technology, UI-TARS, is a proprietary engine developed by ByteDance, which enhances the assistant's performance and capabilities [16][24]. Group 4: Future Implications and Industry Perspectives - The evolution of AI capabilities in smartphones is expected to shift the interaction paradigm from "users seeking services" to "services seeking users," leading to a more intuitive user experience [26][27]. - Experts believe that system-level GUI agents will become standard features in future mobile operating systems, enhancing the autonomy and intelligence of smartphones [26][27]. - Despite the promising advancements, challenges such as computational power, coordination of system-level agents, and security mechanisms remain to be addressed [27].