机器之心

Search documents
CVPR 2025 Highlight|AdaCM2:首个面向超长视频理解的跨模态自适应记忆压缩框架
机器之心· 2025-06-09 04:33
本文第一作者为前 阿里巴巴达摩院高级技术专家 ,现一年级博士研究生满远斌,研究方向为高效多模态大模型推理和生成系统。通信作者为第一作者的导 师,UTA 计算机系助理教授尹淼。尹淼博士目前带领 7 人的研究团队,主要研究方向为多模态空间智能系统,致力于通过软件和系统的联合优化设计实现 空间人工智能的落地。 近年来,大语言模型(LLM)持续刷新着多模态理解的边界。当语言模型具备了「看视频」的能力,视频问答、视频摘要和字幕生成等任务正逐步迈入真正 的智能阶段。但一个现实难题亟待解决—— 如何高效理解超长视频? 为此,来自得克萨斯大学阿灵顿分校(UTA)计算机系研究团队提出了 AdaCM2 :首个支持 超长视频理解 的跨模态记忆压缩框架。该研究已被 CVPR 2025 正式接收 ,并荣获 Highlight 论文 (接收率为 3%),展示出其在技术创新与实际价值上的双重突破。 论文标题:AdaCM2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction 论文地址:https://arxiv.o ...
华为昇腾万卡集群揭秘:如何驯服AI算力「巨兽」?
机器之心· 2025-06-09 04:33
Core Viewpoint - The article discusses the advancements in AI computing power clusters, highlighting their critical role in supporting large-scale AI models and ensuring high availability, fault tolerance, and efficient resource management [2][4][39]. Group 1: High Availability of Super Nodes - AI training and inference require continuous operation, similar to an emergency system in hospitals, where each computer in the cluster has a backup to take over in case of failure, ensuring uninterrupted tasks [6][5]. - Huawei's CloudMatrix 384 super node employs a fault tolerance scheme that includes system-level, business-level, and operational-level fault tolerance, transforming faults into manageable issues [7][8]. Group 2: Cluster Linearity - The ideal scenario for computing power clusters is linear scalability, where the total power of 100 computers should be 100 times that of one, achieved through precise task allocation algorithms [10]. - Huawei's team has developed key technologies to enhance training linearity for large models, achieving linearity rates of 96% for the Pangu Ultra 135B model with 4K cards [11][13]. Group 3: Rapid Recovery in Large-Scale Training - When training with thousands of computing units, the system can automatically save progress, allowing for quick recovery from faults without starting over, significantly reducing downtime [14][15]. - Innovations such as process-level rescheduling and online recovery techniques have been introduced to minimize recovery times to under 3 minutes and even 30 seconds for specific faults [16][20]. Group 4: Fault Management and Diagnosis - A real-time monitoring system continuously checks the health of each computer in the cluster, enabling quick identification and resolution of issues before they escalate [24][26]. - Huawei has developed a comprehensive fault management framework that includes capabilities for error detection, isolation, and recovery, enhancing the reliability of the computing infrastructure [24][28]. Group 5: Simulation and Modeling - Before deploying complex AI models, the computing cluster can simulate scenarios in a virtual environment to identify potential bottlenecks and optimize resource allocation [29][30]. - The introduction of a Markov modeling simulation platform allows for multi-dimensional analysis and performance prediction, improving resource efficiency and system stability [30][31]. Group 6: Framework Migration - Huawei's MindSpore framework has rapidly evolved since its open-source launch, providing tools for seamless migration from other frameworks and enhancing performance during training and inference [37][38]. - The framework supports a wide range of applications, enabling quick deployment of large models and improving inference capabilities [38][39].
质疑DeepSeek-R1、Claude Thinking根本不会推理!苹果争议论文翻车了?
机器之心· 2025-06-09 04:33AI Processing
ICML Spotlight 2025丨追求概率质量的帕累托最优:基于广义α-β散度引导的知识蒸馏框架ABKD
机器之心· 2025-06-09 04:11AI Processing
最新!Ilya现身多大毕业演讲:AI会完成我们能做的一切
机器之心· 2025-06-09 04:11
机器之心报道 机器之心编辑部 大脑是一台生物计算机,数字计算机又有什么不同呢? 6 月 6 日,许久没有露面的 Ilya Sutskever 出现在了多伦多大学的校园里,接受母校授予他的荣誉理学博士 学位。 自去年宣布离开 OpenAI 之后,Ilya 鲜少出现在公众视野,社交媒体动态也很少更新,所以很多人都问 「Ilya 去哪儿了」?他创办的新公司 —— 安全超级智能(safe superintelligence,SSI)也非常神秘,大家只 有一个模糊的印象,即该公司业务围绕着开发一个安全、强大的超级智能系统展开。 回顾 Ilya 的学生生涯,他于 2005 年以数学荣誉理学学士学位(honours bachelor of science in mathematics) 毕业,随后继续深造,在多伦多大学先后取得计算机科学硕士学位和博士学位,师从图灵奖、诺贝尔物理 学奖得主 Hinton。 这次颁发的学位是为了表彰 Ilya 作为计算机科学家和人工智能领域先驱所做的基础性工作及全球影响力, 以及他在倡导安全、负责任的人工智能方面作出的杰出贡献。 仪式之后,Ilya 还在多伦多大学做了一场演讲。他指出,我们正生 ...
ICML 2025 | 全局池化+局部保留,CCA-Attention为LLM长文本建模带来突破性进展
机器之心· 2025-06-08 08:21
Core Insights - The article discusses the introduction of the Core Context Aware Attention mechanism (CCA-Attention) developed by the Pazhou Laboratory and South China University of Technology, which significantly enhances the efficiency of long text context modeling [1][3] - CCA-Attention achieves a reasoning speed that is 7.9 times faster than standard self-attention mechanisms while reducing key-value cache memory usage by 93%, setting a new benchmark for long text processing [3][26] Summary by Sections Introduction - CCA-Attention has been accepted for ICML 2025 and is set to be submitted to ArXiv on December 17, 2024, ahead of other models like DeepSeek NSA and Kimi MoBA [3][8] Research Findings - Recent studies indicate that attention weights in large language models (LLMs) are concentrated on a few tokens, demonstrating significant sparsity, which can be leveraged to reduce computational complexity [4][5] Existing Methods - Current sparse attention methods often rely on predefined patterns, which may limit the model's ability to access critical information spread across different positions in the context [6] Proposed Solution - CCA-Attention is designed to efficiently model long texts by combining global pooling attention with local retention attention, significantly lowering computational costs while maintaining long-distance dependency modeling capabilities [7][11] Mechanism Details - The mechanism consists of two complementary modules: - Global Pooling Module: Extracts core tokens based on the importance of input tokens for subsequent attention calculations [29] - Local Retention Module: Focuses on nearby tokens to capture fine-grained contextual information, complementing the global pooling module [30] Performance Evaluation - CCA-Attention was applied to LLaMA2-7B models and compared against efficient attention methods like StreamingLLM, LM-Infinite, and MInference, showing superior performance in long text tasks [20][21] - In the LongBench-E benchmark, CCA-LLM achieved the highest average score, outperforming other methods in both LLaMA2-7B-32K and LLaMA2-7B-80K models [21][22] Efficiency Metrics - CCA-Attention demonstrated significant advantages in inference speed and memory usage, achieving a speedup of 5.7 times at 64K context length and 7.9 times at 128K context length compared to standard self-attention [26][25] - The memory usage for key-value cache was reduced by up to 93%, highlighting its efficiency in long sequence modeling [26][31]
大模型强化学习新突破——SPO新范式助力大模型推理能力提升!
机器之心· 2025-06-08 08:21
Core Viewpoint - The article discusses the potential of Reinforcement Learning (RL) in enhancing the reasoning capabilities of Large Language Models (LLMs), highlighting the effectiveness of models like DeepSeek R1, Kimi K1.5, and Qwen 3 in complex reasoning tasks [1]. Current Challenges - A fundamental challenge in effective RL is the credit assignment problem, which involves attributing the final evaluation of an LLM's response to specific decision actions (tokens) within the sequence [2]. - The difficulty arises from the sparse reward signals, which only provide clear success or failure feedback at the end of the sequence [3]. Current Methods - In RL, advantage value estimation is commonly used to address the credit assignment problem. Current methods for LLMs can be categorized into two types based on the granularity of advantage value estimation [5]. - Coarse-grained trajectory-level methods, like GRPO used in DeepSeek R1, calculate a single advantage value based on the final reward, which lacks the ability to reward correct parts of incorrect answers or penalize redundant parts of correct answers [6]. - Fine-grained token-level methods, such as PPO, estimate advantage values for each token but struggle with high estimation errors due to the significant differences in trajectory distributions across different prompts and limited sampling during training [6]. New SPO Framework - The research team from the Chinese Academy of Sciences and City University of Hong Kong proposed the Segment Policy Optimization (SPO) framework to overcome these limitations [8]. - SPO employs a medium-grained segment-level advantage value estimation approach, dividing generated sequences into connected segments to calculate advantage values for each segment [11]. Advantages of SPO - Improved credit assignment: The segment-level method provides localized advantage feedback, allowing the model to reward valuable parts of incorrect answers and penalize redundant segments in correct answers [12]. - More accurate advantage value estimation: The segment-level method requires fewer estimation points, effectively utilizing Monte Carlo sampling for unbiased advantage value estimation without relying on unstable critic models [12]. - Flexibility and adaptability: The segment division can be defined arbitrarily, allowing adjustments between token-level and trajectory-level granularity to suit different tasks and applications [12]. Core Components of SPO - The SPO framework consists of three core components: flexible segment division strategy, segment-level advantage value estimation based on Monte Carlo sampling, and policy optimization using segment-level advantages [13]. Specific Instances of SPO - The team proposed two specific instances of the SPO framework: SPO-chain for short chain-of-thought scenarios and SPO-tree for long chain-of-thought scenarios, enhancing Monte Carlo sampling efficiency [15]. Token Probability-Mask Strategy - A token probability-mask strategy was introduced to selectively compute losses for low-probability tokens within segments, which are critical decision points for segment-level advantage values [16]. Experimental Results - In short chain-of-thought scenarios, models trained with SPO achieved higher accuracy compared to various training algorithms [29]. - In long chain-of-thought scenarios, SPO-tree outperformed GRPO in accuracy while using the same base model and training time [31]. - The segment division method based on cutpoints showed the best performance in short chain-of-thought scenarios compared to other methods [36]. Conclusion - The work presents a reinforcement learning training framework, SPO, based on medium-grained segment-level advantage values, balancing between token-level and trajectory-level methods, offering better credit assignment and requiring fewer estimation points [42]. - The effectiveness of the SPO framework and its instances, SPO-chain and SPO-tree, has been validated through experiments [43].
告别「失忆」AI!首个大模型记忆操作系统开源框架来了!
机器之心· 2025-06-08 03:45
该项目来自百家 AI,是北京邮电大学白婷副教授所指导的研究小组, 团队致力于为硅基人类倾力打造情感饱满、记忆超凡的智慧大脑。 大语言模型受限于固定上下文窗口,长期对话中「失忆」、记忆断裂等问题频发,北邮 百家 AI 团队重磅推出首个大模型记忆操作系统开源框架 MemoryOS 。巧 妙融合计算机操作系统原理与人脑分层记忆机制,构建段页式三级存储架构及四大核心模块(存储、更新、检索、生成),提供全链路用户记忆管理方案,让 AI 智能体拥有 持久「记性」与深度「个性」 。 开源项目地址:https://github.com/BAI-LAB/MemoryOS 大型语言模型(LLMs)固定的上下文窗口如同狭窄的信息通道,导致 AI 在长期对话中频繁「失忆」, 常常导致记忆断裂、事实不一致,个性化交互体验也大打折 扣。现有提升 LLM 记忆能力的方法虽各有侧重(如知识提示、RAG 检索优化或模型参数驱动),但均缺乏一个统一的操作系统来对 AI 智能体的记忆进行系统 性、综合性的管理。 北邮百家 AI 团队突破性地提出记忆操作系统 MemoryOS ,旨在为 AI 智能体实现全面、高效的记忆管理。通过打造强大的「记忆操作 ...
为什么用错奖励,模型也能提分?新研究:模型学的不是新知识,是思维
机器之心· 2025-06-08 03:45
本文主要作者是吕昂和谢若冰。吕昂,中国人民大学博士生,研究方向为语言模型结构优化,导师为严睿教授;谢若冰,腾讯高级研究员,研究方向为大语言模 型、推荐系统。 最近的一篇论文中,来自人大和腾讯的研究者们的研究表明,语言模型对强化学习中的奖励噪音具有鲁棒性,即使翻转相当一部分的奖励(例如,正确答案得 0 分,错误答案得 1 分),也不会显著影响下游任务的表现。 研究者解释道,强化学习对下游任务的提升,关键不仅在于奖励的准确性,而更在于模型是否能够产生高质量的思考过程。仅通过奖励模型输出中关键思考词的 出现频率,而非基于答案正确性的奖励,语言模型依然能够在下游任务中取得非常高的峰值表现。这表明,强化学习对下游任务的提升,更多来源于让模型学会 采用恰当的思考路径接近正确答案。而相关的解题基础能力,模型已在预训练阶段获得。因此,预训练阶段的能力提升依然至关重要。 研究者还展示了基于思考模式的极简奖励如何有效校准奖励模型,从而在开放性 NLP 任务中增强语言模型的表现,并使较小的模型也能通过强化学习成功获得思 考能力。 论文地址:https://huggingface.co/papers/2505.22653 代码链接: ...
数学宇宙二维破壁成功!四人组230页证明阿贝尔曲面镜像通道,大一统要实现了?
机器之心· 2025-06-08 03:45
Core Viewpoint - The article discusses a significant breakthrough in mathematics, where four mathematicians extended the modularity theorem from one-dimensional elliptic curves to the more complex two-dimensional abelian surfaces, marking a revolutionary step towards a unified theory in mathematics [5][14][46]. Group 1: Historical Context - The proof of Fermat's Last Theorem by Andrew Wiles in 1994 was a monumental event in mathematics, resolving a problem that had persisted for over 350 years [9][10]. - Wiles' proof revealed a deep connection between elliptic curves and modular forms, providing a powerful method for mathematicians to explore properties of elliptic curves through their corresponding modular forms [11][12][13]. Group 2: Recent Breakthrough - In February 2023, a team of four mathematicians proved that a large class of abelian surfaces has corresponding modular forms, extending the modularity theorem significantly [16][45]. - The team members include Frank Calegari, George Boxer, Toby Gee, and Vincent Pilloni, who collaborated to tackle a problem previously considered nearly impossible [14][16][30]. Group 3: Implications and Future Directions - This breakthrough is expected to provide new tools for solving unresolved problems in number theory, similar to how the proof of the modularity of elliptic curves opened new research avenues [20][46]. - The mathematicians aim to prove that all types of abelian surfaces satisfy the modularity condition, which could lead to further discoveries in the field [20][46].