机器之心 - filings, earnings calls, financial reports, news

机器之心

Search documents

机器之心· 2025-06-03 06:26

机器之心报道机器之心编辑部不用换模型、不用堆参数，靠 SUGAR 模型性能大增！在深度学习领域中，对激活函数的探讨已成为一个独立的研究方向。例如 GELU、SELU 和 SiLU 等函数凭借其平滑梯度与卓越的收敛特性，已成为热门选择。尽管这一趋势盛行，经典 ReLU 函数仍因其简洁性、固有稀疏性及其他优势拓扑特性而广受青睐。然而 ReLU 单元易陷入所谓的「死亡 ReLU 问题」，一旦某个神经元在训练中输出恒为 0，其梯度也为 0，无法再恢复。这一现象最终制约了其整体效能，也是 ReLU 网络的重大缺陷。正是死亡 ReLU 问题催生了大量改进的线性单元函数，包括但不限于：LeakyReLU、PReLU、GELU、SELU、SiLU/Swish 以及 ELU。这些函数通过为负预激活值引入非零激活，提供了不同的权衡。本文，来自德国吕贝克大学等机构的研究者引入了一种新颖的方法：SUGAR（Surrogate Gradient for ReLU），在不牺牲 ReLU 优势的情况下解决了 ReLU 的局限性。即前向传播仍使用标准 ReLU（保持其稀疏性和简单性），反向传播时替换 ReLU 的导数为 ...

思维链也会「跳帧」？浙大团队提出CoT-Bridge，显著提升数学推理性能

机器之心· 2025-06-03 06:26

在大语言模型（LLM）飞速发展的今天，Chain-of-Thought（CoT）技术逐渐成为提升复杂推理能力的关键范式，尤其是在数学、逻辑等结构化任务中表现亮眼。本文的共同第一作者是徐皓雷和颜聿辰。徐皓雷是浙江大学的一年级硕士生，主要研究兴趣集中在大模型推理和可解释性研究；颜聿辰是浙江大学博士三年级研究生，主要研究兴趣集中在大模型推理和智能体。本文通讯作者是浙江大学鲁伟明教授和沈永亮研究员。但你是否注意到：即使是精心构建的 CoT 数据，也可能存在 "跳跃式" 推理，缺失关键中间步骤。对人类专家来说这些步骤或许 "理所当然"，但对模型而言，却可能是无法逾越的鸿沟。为了解决这一问题，浙江大学联合微软亚洲研究院、香港中文大学提出了 Thought Leap Bridge 任务，并开发了思维链修复方法：CoT-Bridge。实验显示，该方法显著提升了多个数学与逻辑任务中的推理准确率，并能作为 "即插即用" 的模块嵌入到知识蒸馏、强化学习等流程中。 CoT 不等于 Coherent-of-Thought 思维跳跃是如何破坏推理链的？ CoT 的设计初衷是让大模型像人一样 "按步骤思考"，然而研究团队发 ...

字节跳动 2025 奖学金计划启动！每人 10 万、名额再增加！

机器之心· 2025-06-03 04:06

科研资助基金10万元人民币用于包括但不限于参加国际学术会议、申请专利等相关支出。 l 关于字节跳动奖学金 |( 「字节跳动奖学金计划 ByteDance Scholarship Program 」是字节跳动2021年发起的一年一期的人才培养项目,为每位获奖学生提供10万元人民币奖学金、内部研学计划特邀通道等奖励。过去四年,共有47位优秀学子获得了字节跳动奖学金计划支持。2025年,字节跳动奖学金将加大对重点方向的关注和投入,并进一步增加名额,计划在中国、新加坡地区评选出20位优秀同学，为他们的技术研究和职业发展提供助力。 l 五重奖励,加大研学投入 |( 加入字节跳动奖学金俱乐部,定期受邀参与学术科研活动,与字节跳动资深技术专家面对面交流,共同探讨技术和行业前沿发展方向。人才计划"绿色通道" 进入字节跳动人才计划专项(Top Seed / 筋斗云人才计划或研究实习生专项）的绿色通道,有机会直达心仪的 Offer。 * 非中国大陆地区提供等值货币资深导师1V1带教基于研究领域,一对一匹配字节跳动资深导师,提供专业的研究指导。内部研学计划特邀通道 , 川申请条件 |( 8 ...

万帧？单卡！智源研究院开源轻量级超长视频理解模型Video-XL-2

机器之心· 2025-06-03 04:06

机器之心发布机器之心编辑部长视频理解是多模态大模型关键能力之一。尽管 OpenAI GPT-4o、Google Gemini 等私有模型已在该领域取得显著进展，当前的开源模型在效果、计算开销和运行效率等方面仍存在明显短板。近日，智源研究院联合上海交通大学等机构，正式发布新一代超长视频理解模型：Video-XL-2。相较于上一版本的 Video-XL，该模型在多个维度全面优化了多模态大模型对长视频内容的理解能力：目前，Video-XL-2 的模型权重已全面向社区开放。未来，该模型有望在影视内容分析、异常行为监测等多个实际场景中展现重要应用价值。技术简介图 1：Video-XL-2 的模型架构示意图图 3. Chunk-based Prefilling 效果更佳：Video-XL-2 在长视频理解任务中表现出色，在 MLVU、Video-MME、LVBench 等主流评测基准上达到了同参数规模开源模型的领先水平。长度更长：新模型显著扩展了可处理视频的时长，支持在单张显卡上高效处理长达万帧的视频输入。速度更快：Video-XL-2 大幅提升了处理效率，编码 2048 帧视频仅需 12 秒，显 ...

LSTM之父22年前构想将成真？一周内AI「自我进化」论文集中发布，新趋势涌现？

机器之心· 2025-06-02 05:22

Core Insights - The article discusses the evolution of AI systems towards self-improvement, highlighting recent advancements in self-learning models, particularly the "Darwin Gödel Machine" (DGM) and other frameworks [1][4][6]. Group 1: Darwin Gödel Machine (DGM) - DGM utilizes foundational models and open-ended algorithms to create and evaluate new AI agents, capable of reading and modifying its own Python code for self-improvement [4][6]. - DGM has demonstrated significant self-improvement capabilities, with performance metrics increasing from 20.0% to 50.0% on the sw-bench and from 14.2% to 30.7% on Polyglot, surpassing manually designed agents [10]. - The system operates by alternating self-modification and downstream task evaluation, continuously generating and scoring new agents [10][8]. Group 2: Self-Rewarded Training (SRT) - SRT is an online self-training reinforcement learning algorithm that allows large language models to self-supervise and train without external labels, enhancing performance through self-generated feedback [14][16]. - Initial experiments show that SRT can achieve performance comparable to standard reinforcement learning methods that rely on gold-standard answers, although it may eventually face performance degradation [18][21]. - Strategies to mitigate reward hacking include early stopping, using offline-generated labels for self-training, and implementing curriculum learning to maintain model performance [22][24][26]. Group 3: Multi-Modal Unsupervised Post-Training (MM-UPT) - MM-UPT is a framework for continuous self-improvement of multi-modal large models in completely unsupervised settings, validated across multiple benchmarks [30][32]. - The framework employs a voting mechanism to generate pseudo-labels from self-generated data, allowing models to enhance their reasoning capabilities without external supervision [39][40]. - Experiments indicate that MM-UPT can improve accuracy from 66.3% to 72.9% on the MathVista benchmark, demonstrating its effectiveness compared to previous unsupervised methods [39][40]. Group 4: UI-Genie Framework - UI-Genie is designed to address challenges in GUI agents, focusing on trajectory validation and the acquisition of high-quality training data [45][47]. - The framework includes a reward model that efficiently processes historical context and unifies action-level and task-level rewards, enhancing the agent's learning capabilities [45][50]. - Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks after iterative self-improvement cycles [52].

姚顺雨提到的「AI下半场」，产品评估仍被误解

机器之心· 2025-06-02 05:22

机器之心报道编辑：张倩前段时间，OpenAI 研究员姚顺雨发表了一篇主题为「AI 下半场」的博客。其中提到，「接下来，AI 的重点将从解决问题转向定义问题。在这个新时代，评估的重要性将超过训练。我们需要重新思考如何训练 AI 以及如何衡量进展，这可能需要更接近产品经理的思维方式。」（参见《清华学霸、OpenAI 姚顺雨：AI 下半由于观点非常有见地，这篇博客吸引了大量从业者围观。有意思的是，亚马逊首席应用科学家 Eugene Yan 最近也发表了一篇博客，专门介绍 AI 产品的评估，可以说是对姚顺雨博客的有力补充。场开战，评估将比训练重要》）这篇博客同样得到了诸多好评。以下是博客原文。自动化评估救不了你的产品你得修复你的流程产品评估这件事，很多人根本没搞懂。总有人以为再加个工具、添个指标，或者让大语言模型当裁判（LLM-as-judge），就能解决问题拯救产品。这根本是在回避核心问题，逃避真正该做的工作。评估并非一劳永逸，也不是什么快速起效的方法 —— 它是运用科学方法的持续实践，是评估驱动开发，是 AI 输出的持续监测。构建产品评估体系，本质上就是在践行科学方法。这才是真正的 ...

AI下半场

评估驱动的开发（EDD）

Artificial Intelligence

AI产品

AI下半场

评估驱动的开发（EDD）

Artificial Intelligence

AI产品

CVPR 2025 | 解决XR算力瓶颈，FovealSeg框架实现毫秒级IOI分割

机器之心· 2025-06-02 05:22

本文共同第一作者为纽约大学研究生 Hongyi Zeng 和Wenxuan Liu。合作作者为 Tianhua Xia、Jinhui Chen、Ziyun Li。通讯作者为纽约大学电子工程系和计算机系教授 Sai Qian Zhang，研究方向为高效人工智能，硬件加速和增强现实。在 XR 正逐步从概念走向落地的今天，如何实现 "按用户所视，智能计算" 的精准理解，一直是视觉计算领域的核心挑战之一。最近，一项来自纽约大学和 Meta Reality Labs 的联合研究引发了行业关注：Foveated Instance Segmentation —— 一种结合眼动追踪信息进行实例分割的新方法，已被 CVPR 2025 正式接收。论文连接：https://arxiv.org/pdf/2503.21854 1. 从算力瓶颈谈起在当下主流的 AR / VR 头显中，内置相机往往具备 720 P、1080 P 乃至 1440 P 的拍摄能力，但要想在如此高分辨率的画面上做实例分割，推理延迟常常飙升至数百毫秒甚至秒级，远超人眼在交互中对时延（50–100 ms）所能接受的舒适阈值。论文 Foveated ...

foveated视觉计算

Artificial Intelligence

FovealSeg框架

FSNet

foveated视觉计算

Artificial Intelligence

FovealSeg框架

FSNet

微软等提出「模型链」新范式，与Transformer性能相当，扩展性灵活性更好

机器之心· 2025-06-02 05:22

机器之心报道编辑：陈陈随着大语言模型 (LLM) 的出现，扩展 Transformer 架构已被视为彻底改变现有 AI 格局并在众多不同任务中取得最佳性能的有利途径。因此，无论是在工业界还是学术界，探索如何扩展 Transformer 模型日益成为一种趋势。在此背景下，LLM 的参数规模呈指数级增长，从数十亿级增长到数万亿级。因此，其爆炸式增长的参数规模也给训练带来了极其昂贵的负担，并且无法针对不同的部署环境提供不同的推理用途。鉴于这种日益增长的扩展律，如何开发和有效利用 LLM 来处理各种场景中的用户指令，已成为整个社区面临的一个开放且关键的挑战。目前，扩展 LLM 架构存在以下问题：本文，来自微软、复旦大学、浙江大学以及上海科技大学的研究者提出了一个新的概念， CoR（Chain-o f-Represe ntation，表征链），它将表征范式的范畴泛化到更广泛的范围。具体而言，本文观察到任何表征总是可以看作是隐藏维度上多个子表征的组合。因此，本文将这种组合定义为表征链，每个子表征对应一条链。基于此定义，通过使用不同数量的前导链（preceding chains），其对应的特征可以用 ...

Microsoft(US:MSFT)

表征链（Chain-of-Representation

表征链（Chain-of-Representation

陶哲轩：感谢Lean，我又重写了20年前经典教材！

机器之心· 2025-06-01 03:30

Core Viewpoint - Terence Tao has announced the creation of a Lean companion project for his undergraduate textbook "Analysis I," aiming to provide an alternative learning method through formalized mathematics using the Lean proof assistant [1][2]. Group 1: Project Overview - The Lean project will convert definitions, theorems, and exercises from "Analysis I" into Lean format, allowing students to engage with the material interactively [2][4]. - The project is intended to transition towards the standard Lean library Mathlib, which is one of the largest and most active formal mathematics projects globally [1][2]. Group 2: Educational Goals - "Analysis I" focuses on foundational topics such as the construction of natural numbers, integers, rational numbers, and real numbers, providing sufficient set theory and logic knowledge for rigorous proofs [2]. - The Lean project aims to enhance the learning experience by allowing students to complete exercises directly in Lean code, although official answers will not be provided [2][4]. Group 3: Structure and Content - The textbook consists of 11 chapters, with some chapters already formalized in Lean [3]. - The project maintains a deliberate strategy of partial independence from Mathlib, initially constructing certain mathematical structures independently before transitioning to Mathlib's definitions [5]. Group 4: Community Engagement - The Lean version of the textbook is now available for users, including mathematics students and researchers interested in formal verification, to engage with the material and provide feedback [7]. - Users have expressed excitement about the project, noting its potential to bridge the gap between traditional mathematics education and programming-based rigor [9].

SFT在帮倒忙？新研究：直接进行强化学习，模型多模态推理上限更高

机器之心· 2025-06-01 03:30

Core Insights - The article discusses the limitations of the "Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL)" paradigm in developing large vision-language models (LVLM), suggesting that SFT may hinder learning and lead to superficial reasoning paths, while RL promotes genuine multimodal reasoning [3][11][21]. Group 1: Research Findings - A study from the University of California, Santa Cruz, and the University of Texas at Dallas reveals that SFT can obstruct learning, often resulting in "pseudo-reasoning paths" that lack depth [3][11]. - The research team created the VLAA-Thinking dataset to systematically investigate the roles of SFT and RL in multimodal reasoning, highlighting the unique contributions of each method [4][8]. - The findings indicate that while SFT improves performance on standard tasks, it falls short in enhancing complex reasoning capabilities, leading to a 47% relative performance decline in a 7B model [11][13]. Group 2: Data and Methodology - The VLAA-Thinking dataset comprises 203,182 samples, with 126,413 for SFT and 25,195 for RL, designed to facilitate high-quality reasoning chains [5][6]. - The research employed a six-stage data processing workflow to effectively transfer reasoning capabilities from pure text models to LVLMs [6][8]. - A mixed reward function was innovatively designed within the GRPO framework to optimize RL in visual contexts, incorporating various reward types for different problem categories [8][19]. Group 3: Performance Analysis - The study found that SFT's imitative reasoning patterns can limit the exploration space during the RL phase, suggesting that direct learning from reward signals is more effective [15][26]. - Models trained solely with GRPO outperformed those that underwent SFT, with the VLAA-Thinker-Qwen2.5-VL-3B model ranking first in the Open LMM reasoning leaderboard for 4B models, achieving a 1.8% record improvement [15][31]. - The analysis revealed that response length and reward scores do not correlate significantly with performance, challenging previous assumptions about their relationship [24][26]. Group 4: Implications for Future Research - The findings suggest that SFT is currently incompatible with GRPO in the context of multimodal reasoning, potentially damaging the performance of both foundational and instruction-tuned LVLMs [21][22]. - The research emphasizes the need for high-quality instruction tuning to enhance model performance in RL settings, indicating that better instruction tuning leads to improved reasoning capabilities post-RL training [31].

VLAA-Thinker-Qwen2.5VL-3B模型

VLAA-Thinker-Qwen2.5VL-3B模型

Previous Next