机器之心
Search documents
NeurIPS 2025 Spotlight | 只需一条演示,DexFlyWheel框架让机器人学会「自我造数据」
机器之心· 2025-10-09 04:43
当我们谈论机器人灵巧操作时,数据稀缺始终是悬浮在头顶的达摩克利斯之剑。 在大模型、自动驾驶领域纷纷依靠海量数据 "涌现" 出强大能力的今天,机器人灵巧操作依然困在数据瓶颈。 项目主页:https://DexFlyWheel.github.io 研究背景: 为什么灵巧手数据生成如此困难? 在具身智能快速发展的今天,覆盖多样化场景和任务的机器人数据集不断出现。但是面向五指灵巧手的操作数据集仍然缺乏。这背后有几个关键原因: 1. 传统方法失效 。 二指夹爪的生成方案在灵巧手上基本无法推广。启发式规划难以应对高维动作优化,LLM 虽然能提供语义引导,却难以生成精细的五指控制轨 迹。 2. 高成本的人工示教 。基于遥操作设备可以有效收集灵巧手数据,但是需大量人力、时间与资源。可扩展性低,难以形成多样化、规模化的数据集。 3. 纯强化学习效率低 。完全依靠强化学习虽然可以训练出成功的策略并迭代成功轨迹,但往往出现手部动作不自然、机械臂抖动等问题,再加上探索效率低,难 以高效产生高质量轨迹。 近期,北京大学、哈尔滨工业大学联合 PsiBot 灵初智能提出首个自我增强的灵巧操作数据生成框架 —— DexFlyWheel。该框 ...
700万参数击败DeepSeek R1等,三星一人独作爆火,用递归颠覆大模型推理
机器之心· 2025-10-09 04:43
机器之心报道 对 HRM 感兴趣的读者可以参考 我们之前的报道 。 编辑:冷猫 Training Small, Thinking Big. 大模型的推理架构颠覆的未免有些太快了。 今年 6 月,来自 Sapient Intelligence 的研究者提出了分层推理模型(HRM),用循环架构打破了传统思维链(CoT)的架构限制,对大模型推理结构产生了重大的 影响。 HRM 仅包含 2700 万个参数(大约比最小的 Qwen3 0.6B 模型小 22 倍) ,仅使用 1000 个训练样本,便在复杂的推理任务上取得了卓越的性能。 仅仅过了四个月,HRM 的架构就彻底不够看了。 来自加拿大蒙特利尔三星先进技术研究所(SAIT)的高级 AI 研究员 Alexia Jolicoeur-Martineau 介绍了微型递归模型(TRM)。 这个 TRM 有多离谱呢?一个 仅 包含 700 万个参数 (比 HRM 还要小 4 倍)的网络 ,在某些最困难的推理基准测试中,其参数数量与 o3-mini 和 Gemini 2.5 Pro 等 尖端语言模型相比,甚至可以超越它们,尽管这些模型的参数数量是 TRM 的 10,000 倍。 ...
Qwen要做机器人了:林俊旸官宣成立具身智能团队
机器之心· 2025-10-09 04:43
Core Insights - Qwen, a leader in open-source models, is transitioning into robotics by forming a dedicated team for embodied AI, indicating a shift from virtual to physical applications of their models [1][8] - The establishment of this robotics team aligns with Alibaba Cloud's broader strategy to support the embodied intelligence sector, leveraging their existing AI capabilities [8][12] Group 1: Company Developments - Alibaba's Qwen has initiated a robotics team to enhance its models' capabilities in real-world applications, focusing on long-horizon reasoning and tool utilization through reinforcement learning [1][8] - The recent funding of nearly 1 billion yuan for a robotics company, with Alibaba Cloud as a lead investor, marks a significant investment in the embodied intelligence space [5][8] - Qwen's models, particularly Qwen-VL, are being widely adopted by companies in the embodied intelligence sector for their strengths in spatial understanding and long-context memory [6][8] Group 2: Market Trends - The global robotics market is projected to reach $7 trillion by 2050, attracting significant investment from various sectors, including government funds [12] - Major tech companies, including NVIDIA and SoftBank, are heavily investing in robotics, indicating a competitive landscape where the integration of generative AI and robotics is expected to transform human-machine interactions [9][10][11]
听说,大家都在梭后训练?最佳指南来了
机器之心· 2025-10-09 02:24
Core Insights - The article emphasizes the shift in focus from pre-training to post-training in large language models (LLMs), highlighting the diminishing returns of scaling laws as model sizes reach hundreds of billions of parameters [2][3][11]. Group 1: Importance of Post-Training - Post-training is recognized as a crucial phase for enhancing the reasoning capabilities of models like OpenAI's series, DeepSeek R1, and Google Gemini, marking it as a necessary step towards advanced intelligence [3][11]. - The article introduces various innovative post-training methods such as Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), and Reinforcement Learning with Verifiable Rewards (RLVR) [2][3][12]. Group 2: Transition from Pre-Training to Post-Training - The evolution from pre-training to instruction fine-tuning is discussed, where foundational models are trained on large datasets to predict the next token, but often lack practical utility in real-world applications [7][8]. - Post-training aims to align model behavior with user expectations, focusing on quality over quantity in the datasets used, which are typically smaller but more refined compared to pre-training datasets [11][24]. Group 3: Supervised Fine-Tuning (SFT) - Supervised Fine-Tuning (SFT) is described as a process that transforms a pre-trained model into one that can follow user instructions effectively, relying on high-quality instruction-answer pairs [21][24]. - The quality of the SFT dataset is critical, as even a small number of low-quality samples can negatively impact the model's performance [25][26]. Group 4: Reinforcement Learning Techniques - Reinforcement Learning (RL) is highlighted as a complex yet effective method for model fine-tuning, with various reward mechanisms such as RLHF, RLAIF, and RLVR being employed to enhance model performance [39][41]. - The article outlines the importance of reward models in RLHF, which are trained using human preference data to guide model outputs [44][46]. Group 5: Evaluation of Post-Training Models - The evaluation of post-training models is multifaceted, requiring a combination of automated and human assessments to capture various quality aspects [57][58]. - Automated evaluations are cost-effective and quick, while human evaluations provide a more subjective quality measure, especially for nuanced tasks [59][60].
机器人「看片」自学新技能:NovaFlow从生成视频中提取动作流,实现零样本操控
机器之心· 2025-10-09 02:24
本文共同第一作者为李鸿宇(布朗大学博士生)和孙凌峰(Robotics and AI Institute 研究员,博士毕业于加州大学伯克利分校)。通讯作者付佳慧在 Robotics and AI Institute 任研究员,博士毕业于麻省理工学院。George Konidaris 为布朗大学副教授。 构建能够在新环境中、无需任何针对性训练就能执行多样化任务的通用机器人,是机器人学领域一个长期追逐的圣杯。近年来,随着大型语言模型(LLMs)和视 觉语言模型(VLMs)的飞速发展,许多研究者将希望寄托于视觉 - 语言 - 动作(VLA)模型,期望它们能复刻 LLM 和 VLM 在泛化性上取得的辉煌。然而,理想 很丰满,现实却很骨感。VLA 模型的端到端训练范式,要求海量与特定机器人相关的 "视觉 - 语言 - 动作" 数据。与 LLM 和 VLM 可以轻易获取的网络规模数据不 同,机器人数据的采集成本极高、难度极大,这形成了一个巨大的 "数据瓶颈"。有没有可能绕过这个瓶颈,让机器人不依赖于昂贵的 "亲身经历" 数据,也能学会 新技能呢? 最近,来自布朗大学(Brown University)和机器人与人工智能研究 ...
Being-VL的视觉BPE路线:把「看」和「说」真正统一起来
机器之心· 2025-10-09 02:24
Core Insights - The article discusses the limitations of traditional multimodal models, particularly how CLIP-style encoders prematurely align visual representations with text space, leading to potential hallucinations when detailed, non-language-dependent queries are made [2][6] - A new method called Being-VL is proposed, which emphasizes a post-alignment approach, allowing for the discrete representation of images before aligning them with text, thereby preserving visual structure and reducing the risk of information loss [2][3] Being-VL Implementation - Being-VL consists of three main steps: quantifying images into discrete VQ tokens using VQ-GAN, training a visual BPE that measures both co-occurrence frequency and spatial consistency, and finally unifying visual and text tokens into a single sequence for modeling [3][10] - The visual BPE tokenizer prioritizes both frequency and spatial consistency to create a more semantically and structurally meaningful token set, which is independent of text [8][9] Training Strategy - The training process is divided into three stages: 1. **Embedding Alignment**: Only the new visual token embeddings are trained while freezing other parameters to maintain existing language capabilities [12] 2. **Selective Fine-tuning**: A portion of the LLM layers is unfrozen to facilitate cross-modal interaction at lower representation levels [12] 3. **Full Fine-tuning**: All layers are unfrozen for comprehensive training on complex reasoning and instruction data [12][10] Experimental Results - Experiments indicate that the discrete representation of images followed by visual BPE and unified modeling with text leads to improved reliability in detail-sensitive queries and reduces hallucinations compared to traditional methods [14][16] - The study highlights the importance of a gradual training approach, showing that a combination of progressive unfreezing and curriculum learning significantly outperforms single-stage training methods [14][10] Visual BPE Token Activation - Visualization of embedding weights shows that using visual BPE leads to a more balanced distribution of weights between text and visual tokens, indicating reduced modality gaps and improved cross-modal attention [16][19] Token Size and Training Efficiency - The research explores the impact of BPE token size on training efficiency, finding an optimal balance in resource-limited scenarios, while larger token sizes may lead to diminishing returns due to sparsity [19][20] Development and Summary - The evolution from Being-VL-0 to Being-VL-0.5 reflects enhancements in the unified modeling framework, incorporating priority-guided encoding and a structured training approach [20][24]
更大,还能更快,更准!蚂蚁开源万亿参数语言模型Ling-1T,刷新多项SOTA
机器之心· 2025-10-09 02:24
Core Insights - The article discusses the launch of Ling-1T, a trillion-parameter open-source language model by Ant Group, highlighting its efficiency and performance in various benchmarks [2][5][52]. Group 1: Model Performance - Ling-1T has achieved impressive results in multiple benchmark tests, outperforming several leading models in key areas such as knowledge understanding and reasoning [6][9][10]. - In coding and math reasoning tasks, Ling-1T consistently ranks among the top performers, demonstrating strong logical consistency and cross-domain reasoning capabilities [8][11]. - The model's performance in specific benchmarks includes a score of 92.19 in C-Eval and 87.45 in FinanceReasoning, indicating its high knowledge density and reasoning ability [9][10]. Group 2: Efficiency and Architecture - Ling-1T utilizes a Mixture of Experts (MoE) architecture, allowing it to maintain high reasoning capabilities while significantly reducing computational costs [5][52]. - The model operates on a paradigm of "large parameter reserves + small parameter activation," enabling it to handle complex problems efficiently with a lower energy footprint [53][54]. - It supports a context length of 128K, enhancing its ability to process long documents without losing context, which is crucial for industries like finance and law [62]. Group 3: Open Source Philosophy - The article emphasizes the importance of open-source models in the AI landscape, suggesting that they enable faster iteration and lower costs for technology development [72][73]. - Ant Group's approach to open-sourcing Ling-1T allows for broader accessibility and collaboration, fostering an ecosystem where developers and small businesses can participate [74][75]. - The open-source model not only democratizes access to advanced AI capabilities but also enhances transparency and trust in AI applications across various sectors [72][74].
重磅|清华物理系传奇姚顺宇离职,不认同Anthropic,加入DeepMind
机器之心· 2025-10-08 04:13
机器之心报道 机器之心编辑部 最新消息,清华物理系传奇特奖得主 Yao Shunyu(姚顺宇)离开 Anthropic,加入 Google DeepMind。 根据姚顺宇在博客上发表的文章得知,他于 9 月 19 日从 Anthropic 正式离职,9 月 29 日加入 Google DeepMind。 是的,不是姚顺雨,而是姚顺宇,前者是学计算机出身,也是著名的 《AI 下半场》 作者,而后者是学物理出身,且在本科期间就名声大噪。 资料显示,姚顺宇于 2015 年进入清华大学物理系,大二开始选修研究生理论课程,在周期驱动系统拓扑场论领域,提出非厄米系统中拓扑能带理论的新方法,并 准确预测相关现象,相关研究成果发表在世界物理顶级期刊 Phys. Rev. Lett. 上。 其在物理学研究上的卓越成就让一位 211 大学副教授也不禁感叹:「我们这边即使是教授,也没有能超过姚顺宇同学目前本科期间的物理水平的。」 图源:知乎 @ 林晨 2019 年,姚顺宇清华大学本科毕业后远赴斯坦福攻读博士,毕业后先是到加州伯克利大学做了一段时间的博士后,之后于 2024 年 10 月 1 日加入 Anthropic 的 Clau ...
谷歌大神出手,免费发布《智能体设计模式》,AI Agent开发的终极秘籍
机器之心· 2025-10-08 04:13
机器之心报道 编辑:Panda 当前,AI 领域最火热的浪潮无疑是 AI Agent(智能体) 。从科技巨头到创业公司,无数开发者正投身于构建能够自主理解、规划和执行复杂任务的智能系统。 然而,在这股「淘金热」的背后,开发者们也面临着巨大的挑战:如何系统性地设计智能体的行为?如何确保系统的稳定性和可靠性?如何避免一次又一次地 「重造轮子」?整个领域迫切需要一套经过实践检验的「建筑图纸」和方法论。 学习,如有一本好书,往往事半功倍。 近日,谷歌资深工程主管、杰出工程师 Antonio Gulli 在网上公开发布了自己的新书《 Agentic Design Patterns(智能体设计模式) 》。 对许多开发者来说,「 设计模式(Design Pattern) 」这个词并不陌生。它曾在软件工程领域扮演了「圣经」般的角色,将无数前辈的最佳实践固化为可复用的解 决方案。而 Antonio Gulli 此举的意义,正是在于为方兴未艾的智能体开发领域,提供了首批系统性的「设计模式」,帮助开发者让打造强大、可靠的智能体变得 有章可循。 现在,虽然该书已经在亚马逊开启预售(作者表示全部版税将捐赠给拯救儿童组织),但感兴趣的读 ...
开源RL框架Verlog来了,专为LLM智能体打造,400回合不成问题
机器之心· 2025-10-08 04:13
它在继承 VeRL 和 BALROG 的基础上,并遵循 pytorch-a2c-ppo-acktr-gail 的成熟设计原则,引入了一系列专 门优化手段,从而在任务跨度从短暂交互到数百回合时,依然能够实现稳定而高效的训练。 以往的框架(如 VeRL 和 RAGEN)能够较好地处理约 10 回合的任务,verl-agent 则可扩展至 50 回合。而 Verlog 则被设计用于超过 400 回合的环境,使其在复杂的长期决策任务中具备独特优势。 这一能力已在 BabyAI、BabaIsAI 和 Crafter 等高难度领域得到验证。以 Crafter 为例,其回合长度范围在 70 到 400 步之间,平均约为 190 步。在这些充满挑战的环境中,Verlog 都能够开箱即用地展现出强劲的性能。 机器之心报道 机器之心编辑部 AI 时代,智能体对短期对话的处理能力已不再是难题。真正的挑战是让智能体在数百步的探索中依然保持 清晰的推理与稳健的决策。 传统的强化学习框架在几十步内尚能应付,但一旦任务延展至数百步,奖励稀疏、历史冗长、策略崩塌便 接踵而至。 为了应对这些挑战,来自卡内基梅隆大学、香港大学等机构的研究者提出 ...