Workflow
机器之心
icon
Search documents
经典ReLU回归!重大缺陷「死亡ReLU问题」已被解决
机器之心· 2025-06-03 06:26
机器之心报道 机器之心编辑部 不用换模型、不用堆参数,靠 SUGAR 模型性能大增! 在深度学习领域中,对激活函数的探讨已成为一个独立的研究方向。例如 GELU、SELU 和 SiLU 等函数凭借其平滑梯度与卓越的收敛特性,已成为热门选择。 尽管这一趋势盛行,经典 ReLU 函数仍因其简洁性、固有稀疏性及其他优势拓扑特性而广受青睐。 然而 ReLU 单元易陷入所谓的「死亡 ReLU 问题」, 一旦某个神经元在训练中输出恒为 0,其梯度也为 0,无法再恢复。 这一现象最终制约了其整体效能,也是 ReLU 网络的重大缺陷。 正是死亡 ReLU 问题催生了大量改进的线性单元函数,包括但不限于:LeakyReLU、PReLU、GELU、SELU、SiLU/Swish 以及 ELU。这些函数通过为负预激活值 引入非零激活,提供了不同的权衡。 本文,来自德国吕贝克大学等机构的研究者引入了一种新颖的方法:SUGAR(Surrogate Gradient for ReLU),在不牺牲 ReLU 优势的情况下解决了 ReLU 的局限 性。即前向传播仍使用标准 ReLU(保持其稀疏性和简单性),反向传播时替换 ReLU 的导数为 ...
思维链也会「跳帧」?浙大团队提出CoT-Bridge,显著提升数学推理性能
机器之心· 2025-06-03 06:26
在大语言模型(LLM)飞速发展的今天,Chain-of-Thought(CoT)技术逐渐成为提升复杂推理能力的关键范式,尤 其是在数学、逻辑等结构化任务中表现亮眼。 本文的共同第一作者是徐皓雷和颜聿辰。徐皓雷是浙江大学的一年级硕士生,主要研究兴趣集中在大模型推理和可解释 性研究;颜聿辰是浙江大学博士三年级研究生,主要研究兴趣集中在大模型推理和智能体。本文通讯作者是浙江大学鲁 伟明教授和沈永亮研究员。 但你是否注意到:即使是精心构建的 CoT 数据,也可能存在 "跳跃式" 推理,缺失关键中间步骤。对人类专家来说这 些步骤或许 "理所当然",但对模型而言,却可能是无法逾越的鸿沟。 为了解决这一问题,浙江大学联合微软亚洲研究院、香港中文大学提出了 Thought Leap Bridge 任务,并开发了思维 链修复方法:CoT-Bridge。实验显示,该方法显著提升了多个数学与逻辑任务中的推理准确率,并能作为 "即插即用" 的模块嵌入到知识蒸馏、强化学习等流程中。 CoT 不等于 Coherent-of-Thought 思维跳跃是如何破坏推理链的? CoT 的设计初衷是让大模型像人一样 "按步骤思考",然而研究团队发 ...
字节跳动 2025 奖学金计划启动!每人 10 万、名额再增加!
机器之心· 2025-06-03 04:06
科研资助基金10万元人民币 用于包括但不限于参加国际学术会 议、申请专利等相关支出。 l 关于字节跳动奖学金 |( 「字节跳动奖学金计划 ByteDance Scholarship Program 」是字节跳动2021年发起的一年一期的 人才培养项目,为每位获奖学生提供10万元人 民币奖学金、内部研学计划特邀通道等奖励。过 去四年,共有47位优秀学子获得了字节跳动奖 学金计划支持。2025年,字节跳动奖学金将加 大对重点方向的关注和投入,并进一步增加名 额,计划在中国、新加坡地区评选出20位优秀 同学,为他们的技术研究和职业发展提供助力。 l 五重奖励,加大研学投入 |( 加入字节跳动奖学金俱乐部,定期受 邀参与学术科研活动,与字节跳动资 深技术专家面对面交流,共同探讨技 术和行业前沿发展方向。 人才计划"绿色通道" 进入字节跳动人才计划专项(Top Seed / 筋斗云人才计划或研究实习生 专项)的绿色通道,有机会直达心仪 的 Offer。 * 非中国大陆地区提供等值货币 资深导师1V1带教 基于研究领域,一对一匹配字节跳动 资深导师,提供专业的研究指导。 内部研学计划特邀通道 , 川 申请条件 |( 8 ...
万帧?单卡!智源研究院开源轻量级超长视频理解模型Video-XL-2
机器之心· 2025-06-03 04:06
机器之心发布 机器之心编辑部 长视频理解是多模态大模型关键能力之一。尽管 OpenAI GPT-4o、Google Gemini 等私有模型已在该领域取得显著进展,当前的开源模型在效果、计算 开销和运行效率等方面仍存在明显短板。 近日,智源研究院联合上海交通大学等机构,正式发布新一代超长视频理解模型:Video-XL-2。相较于上一版本的 Video-XL,该模型在多个维度全面优 化了多模态大模型对长视频内容的理解能力: 目前,Video-XL-2 的模型权重已全面向社区开放。未来,该模型有望在影视内容分析、异常行为监测等多个实际场景中展现重要应用价值。 技术简介 图 1:Video-XL-2 的模型架构示意图 图 3. Chunk-based Prefilling 效果更佳:Video-XL-2 在长视频理解任务中表现出色,在 MLVU、Video-MME、LVBench 等主流评测基准上达到了同参数规模开源模型的领先 水平。 长度更长:新模型显著扩展了可处理视频的时长,支持在单张显卡上高效处理长达万帧的视频输入。 速度更快:Video-XL-2 大幅提升了处理效率,编码 2048 帧视频仅需 12 秒,显 ...
LSTM之父22年前构想将成真?一周内AI「自我进化」论文集中发布,新趋势涌现?
机器之心· 2025-06-02 05:22
Core Insights - The article discusses the evolution of AI systems towards self-improvement, highlighting recent advancements in self-learning models, particularly the "Darwin Gödel Machine" (DGM) and other frameworks [1][4][6]. Group 1: Darwin Gödel Machine (DGM) - DGM utilizes foundational models and open-ended algorithms to create and evaluate new AI agents, capable of reading and modifying its own Python code for self-improvement [4][6]. - DGM has demonstrated significant self-improvement capabilities, with performance metrics increasing from 20.0% to 50.0% on the sw-bench and from 14.2% to 30.7% on Polyglot, surpassing manually designed agents [10]. - The system operates by alternating self-modification and downstream task evaluation, continuously generating and scoring new agents [10][8]. Group 2: Self-Rewarded Training (SRT) - SRT is an online self-training reinforcement learning algorithm that allows large language models to self-supervise and train without external labels, enhancing performance through self-generated feedback [14][16]. - Initial experiments show that SRT can achieve performance comparable to standard reinforcement learning methods that rely on gold-standard answers, although it may eventually face performance degradation [18][21]. - Strategies to mitigate reward hacking include early stopping, using offline-generated labels for self-training, and implementing curriculum learning to maintain model performance [22][24][26]. Group 3: Multi-Modal Unsupervised Post-Training (MM-UPT) - MM-UPT is a framework for continuous self-improvement of multi-modal large models in completely unsupervised settings, validated across multiple benchmarks [30][32]. - The framework employs a voting mechanism to generate pseudo-labels from self-generated data, allowing models to enhance their reasoning capabilities without external supervision [39][40]. - Experiments indicate that MM-UPT can improve accuracy from 66.3% to 72.9% on the MathVista benchmark, demonstrating its effectiveness compared to previous unsupervised methods [39][40]. Group 4: UI-Genie Framework - UI-Genie is designed to address challenges in GUI agents, focusing on trajectory validation and the acquisition of high-quality training data [45][47]. - The framework includes a reward model that efficiently processes historical context and unifies action-level and task-level rewards, enhancing the agent's learning capabilities [45][50]. - Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks after iterative self-improvement cycles [52].
姚顺雨提到的「AI下半场」,产品评估仍被误解
机器之心· 2025-06-02 05:22
机器之心报道 编辑:张倩 前段时间,OpenAI 研究员姚顺雨发表了一篇主题为「AI 下半场」的博客。其中提到,「接下来,AI 的重点将从解决问题转向定义问题。在这个新时代,评估的 重要性将超过训练。我们需要重新思考如何训练 AI 以及如何衡量进展,这可能需要更接近产品经理的思维方式。」(参见《 清华学霸、OpenAI 姚顺雨:AI 下半 由于观点非常有见地,这篇博客吸引了大量从业者围观。 有意思的是,亚马逊首席应用科学家 Eugene Yan 最近也发表了一篇博客,专门介绍 AI 产品的评估,可以说是对姚顺雨博客的有力补充。 场开战,评估将比训练重要 》) 这篇博客同样得到了诸多好评。 以下是博客原文。 自动化评估救不了你的产品 你得修复你的流程 产品评估这件事,很多人根本没搞懂。总有人以为再加个工具、添个指标,或者让大语言模型当裁判(LLM-as-judge),就能解决问题拯救产品。这根本是在回避 核心问题,逃避真正该做的工作。评估并非一劳永逸,也不是什么快速起效的方法 —— 它是运用科学方法的持续实践,是评估驱动开发,是 AI 输出的持续监 测。 构建产品评估体系,本质上就是在践行科学方法。这才是真正的 ...
CVPR 2025 | 解决XR算力瓶颈,FovealSeg框架实现毫秒级IOI分割
机器之心· 2025-06-02 05:22
本文共同第一作者为纽约大学研究生 Hongyi Zeng 和Wenxuan Liu。合作作者为 Tianhua Xia、Jinhui Chen、Ziyun Li。通讯作者为纽约大学电子工程系和计算 机系教授 Sai Qian Zhang,研究方向为高效人工智能,硬件加速和增强现实。 在 XR 正逐步从概念走向落地的今天,如何实现 "按用户所视,智能计算" 的精准理解,一直是视觉计算领域的核心挑战之一。 最近,一项来自 纽约大学和 Meta Reality Labs 的联合研究引发了行业关注:Foveated Instance Segmentation —— 一种结合眼动追踪信息进行实例分割的新方法, 已被 CVPR 2025 正式接收 。 论文连接:https://arxiv.org/pdf/2503.21854 1. 从算力瓶颈谈起 在当下主流的 AR / VR 头显中,内置相机往往具备 720 P、1080 P 乃至 1440 P 的拍摄能力,但要想在如此高分辨率的画面上做实例分割,推理延迟常常飙升至数 百毫秒甚至秒级,远超人眼在交互中对时延(50–100 ms)所能接受的舒适阈值。论文 Foveated ...
微软等提出「模型链」新范式,与Transformer性能相当,扩展性灵活性更好
机器之心· 2025-06-02 05:22
机器之心报道 编辑:陈陈 随着大语言模型 (LLM) 的出现,扩展 Transformer 架构已被视为彻底改变现有 AI 格局并在众多不同任务中取得最佳性能的有利途径。因此,无论是在工业界还是 学术界,探索如何扩展 Transformer 模型日益成为一种趋势。 在此背景下,LLM 的参数规模呈指数级增长,从数十亿级增长到数万亿级。因此,其爆炸式增长的参数规模也给训练带来了极其昂贵的负担,并且无法针对不同 的部署环境提供不同的推理用途。 鉴于这种日益增长的扩展律,如何开发和有效利用 LLM 来处理各种场景中的用户指令,已成为整个社区面临的一个开放且关键的挑战。 目前,扩展 LLM 架构存在以下问题: 本文,来自微软、复旦大学、浙江大学以及上海科技大学的研究者提出了一个新的概念, CoR(Chain-o f-Represe ntation,表征链) ,它将表征范式的范畴泛化 到更广泛的范围。 具体而言,本文观察到任何表征总是可以看作是隐藏维度上多个子表征的组合。因此,本文将这种组合定义为表征链,每个子表征对应一条链。基于此定义,通 过使用不同数量的前导链(preceding chains),其对应的特征可以用 ...
陶哲轩:感谢Lean,我又重写了20年前经典教材!
机器之心· 2025-06-01 03:30
Core Viewpoint - Terence Tao has announced the creation of a Lean companion project for his undergraduate textbook "Analysis I," aiming to provide an alternative learning method through formalized mathematics using the Lean proof assistant [1][2]. Group 1: Project Overview - The Lean project will convert definitions, theorems, and exercises from "Analysis I" into Lean format, allowing students to engage with the material interactively [2][4]. - The project is intended to transition towards the standard Lean library Mathlib, which is one of the largest and most active formal mathematics projects globally [1][2]. Group 2: Educational Goals - "Analysis I" focuses on foundational topics such as the construction of natural numbers, integers, rational numbers, and real numbers, providing sufficient set theory and logic knowledge for rigorous proofs [2]. - The Lean project aims to enhance the learning experience by allowing students to complete exercises directly in Lean code, although official answers will not be provided [2][4]. Group 3: Structure and Content - The textbook consists of 11 chapters, with some chapters already formalized in Lean [3]. - The project maintains a deliberate strategy of partial independence from Mathlib, initially constructing certain mathematical structures independently before transitioning to Mathlib's definitions [5]. Group 4: Community Engagement - The Lean version of the textbook is now available for users, including mathematics students and researchers interested in formal verification, to engage with the material and provide feedback [7]. - Users have expressed excitement about the project, noting its potential to bridge the gap between traditional mathematics education and programming-based rigor [9].
SFT在帮倒忙?新研究:直接进行强化学习,模型多模态推理上限更高
机器之心· 2025-06-01 03:30
Core Insights - The article discusses the limitations of the "Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL)" paradigm in developing large vision-language models (LVLM), suggesting that SFT may hinder learning and lead to superficial reasoning paths, while RL promotes genuine multimodal reasoning [3][11][21]. Group 1: Research Findings - A study from the University of California, Santa Cruz, and the University of Texas at Dallas reveals that SFT can obstruct learning, often resulting in "pseudo-reasoning paths" that lack depth [3][11]. - The research team created the VLAA-Thinking dataset to systematically investigate the roles of SFT and RL in multimodal reasoning, highlighting the unique contributions of each method [4][8]. - The findings indicate that while SFT improves performance on standard tasks, it falls short in enhancing complex reasoning capabilities, leading to a 47% relative performance decline in a 7B model [11][13]. Group 2: Data and Methodology - The VLAA-Thinking dataset comprises 203,182 samples, with 126,413 for SFT and 25,195 for RL, designed to facilitate high-quality reasoning chains [5][6]. - The research employed a six-stage data processing workflow to effectively transfer reasoning capabilities from pure text models to LVLMs [6][8]. - A mixed reward function was innovatively designed within the GRPO framework to optimize RL in visual contexts, incorporating various reward types for different problem categories [8][19]. Group 3: Performance Analysis - The study found that SFT's imitative reasoning patterns can limit the exploration space during the RL phase, suggesting that direct learning from reward signals is more effective [15][26]. - Models trained solely with GRPO outperformed those that underwent SFT, with the VLAA-Thinker-Qwen2.5-VL-3B model ranking first in the Open LMM reasoning leaderboard for 4B models, achieving a 1.8% record improvement [15][31]. - The analysis revealed that response length and reward scores do not correlate significantly with performance, challenging previous assumptions about their relationship [24][26]. Group 4: Implications for Future Research - The findings suggest that SFT is currently incompatible with GRPO in the context of multimodal reasoning, potentially damaging the performance of both foundational and instruction-tuned LVLMs [21][22]. - The research emphasizes the need for high-quality instruction tuning to enhance model performance in RL settings, indicating that better instruction tuning leads to improved reasoning capabilities post-RL training [31].