机器之心

Search documents
时空压缩!剑桥大学提出注意力机制MTLA:推理加速5倍,显存减至1/8
机器之心· 2025-06-11 00:24
Core Insights - The article discusses the significance of the Transformer architecture in the context of large language models, emphasizing its irreplaceable role despite challenges related to computational complexity and efficiency [1][2][5]. Group 1: Transformer Architecture and Challenges - The self-attention mechanism of the Transformer, while powerful in modeling long-range dependencies, faces challenges due to its quadratic computational complexity, which has led to research on alternatives [1]. - The KV cache size grows linearly with the sequence length during inference, becoming a critical bottleneck for efficiency as model parameters increase [1][2]. Group 2: Innovations in KV Cache Management - The MLA mechanism proposed by the DeepSeek team compresses the KV cache in the latent space, significantly improving inference efficiency, especially in low-resource scenarios [2][7]. - The introduction of Multi-head Temporal Latent Attention (MTLA) combines temporal and latent space compression, addressing the redundancy in the KV cache as sequence lengths increase [2][9]. Group 3: Comparison of Attention Mechanisms - Current models often use Grouped-Query Attention (GQA) to reduce KV cache size by grouping query heads, achieving a balance between efficiency and performance [5]. - MTLA outperforms existing methods like GQA and MQA by maintaining model performance while compressing both spatial and temporal dimensions of the KV cache [9][20]. Group 4: Performance and Future Potential - MTLA demonstrates superior performance across various tasks, achieving over 5 times faster inference speed and reducing GPU memory usage by more than 8 times compared to standard MHA [20]. - The potential for MTLA in large-scale deployments is significant, especially as the demand for efficient KV cache management grows with increasing model sizes and sequence lengths [23][24].
高考数学全卷重赛!一道题难倒所有大模型,新选手Gemini夺冠,豆包DeepSeek并列第二
机器之心· 2025-06-10 17:56
机器之心报道 编辑:杨文、+0 AI挑战全套高考数学题来了! 话接上回。 高考数学一结束,我们连夜使用六款大模型产品,按照一般用户截图提问的方式,挑战了 14 道最新高考客观题,不过有网友质疑测评过程不够严 谨,所以这次我们加上解答题,重新测一遍。 本次参加挑战的选手分别是:Doubao-1.5-thinking-vision-pro、DeepSeek R1、Qwen3-235b、hunyuan-t1-latest、文心 X1 Turbo、o3,并且新增网友们非常期待的 Gemini 2.5 pro。上一次我们使用网页端测试,这次除 o3 外,其他模型全部调用 API。 在考题选择上,我们仍然采用 2025 年数学新课标 Ⅰ 卷,包含 14 道客观题,总计 73 分;5 道解答题,总计 77 分。其中第 6 题由于涉及到图片,我们就单独摘出 来,后面通过上传题目截图的形式针对多模态大模型进行评测。其他文本题目全部转成 latex 格式,分别投喂给大模型,还是老规矩,不做 System Prompt 引导, 不开启联网搜索,直接输出结果。 (注:第 17 题虽然也涉及到图片,但文字表述足够清晰,不影响答题,因此 ...
扩散语言模型真的会比自回归好?理论分析结果可能恰恰相反
机器之心· 2025-06-10 08:41
本工作来自北京大学智能学院贺笛老师课题组与蚂蚁集团武威团队。贺笛老师在机器学习领域获得过多项荣誉,包括 ICLR 2023 杰出论文奖与 ICLR 2024 杰出论 文奖提名。 扩散模型近年来在图像生成领域取得了令人瞩目的成就,其生成图像的质量和多样性令人惊叹。这自然引发了人们的思考:这种强大的生成范式能否迁移到文本 领域,挑战甚至取代目前主流的自回归语言模型?扩散语言模型(Diffusion Language Models)凭借其并行生成多个词元的潜力,似乎预示着文本生成领域的一场 效率革命。然而,这一前景是否真的如此美好? 来自北京大学和蚂蚁集团的最新研究表明,答案远非简单的 "是" 或 "否",在某些关键场景下,结论甚至可能恰 恰相反。 | Guhao Feng* | Yihan Geng* | Jian Guan | Wei Wu | Liwei Wang | | --- | --- | --- | --- | --- | | Peking University | Peking University | Ant Group | Ant Group | Peking University | 论文标题 ...
一个md文件收获超400 star,这份综述分四大范式全面解析了3D场景生成
机器之心· 2025-06-10 08:41
在构建通用人工智能、世界模型、具身智能等关键技术的竞赛中,一个能力正变得愈发核心 —— 高质量的 3D 场景生成 。过去三年,该领域的研究呈指数级增 长,每年论文数量几乎翻倍,反映出其在多模态理解、机器人、自动驾驶乃至虚拟现实系统中的关键地位。 技术路线 四大生成范式全面解析 早期的 3D 场景生成工作主要通过程序化生成实现。自 2021 年以来,随着生成式模型(尤其是扩散模型)的崛起,以及 NeRF、3D Gaussians 等新型 3D 表征的提 出,该领域进入爆发式增长阶段。方法日益多元,场景建模能力持续提升,也推动了研究论文数量的快速上升。这一趋势凸显出对对该领域进行系统化梳理与全 面评估的迫切需求。 论文标题:3D Scene Generation: A Survey 论文链接:https://arxiv.org/abs/2505.05474 精选列表:https://github.com/hzxie/Awesome-3D-Scene-Generation 在本综述中,研究团队构建了一套系统的技术分类体系,将现有 3D 场景生成方法划分为四大主流范式,每类方法均结合代表性工作进行了深入梳理。 这四大 ...
李飞飞团队新作:DiT不训练直接改架构,模型深度减半,质量还提高了
机器之心· 2025-06-10 08:41
Core Insights - The article discusses a technique called "Grafting" that allows researchers to explore new model architecture designs by editing pre-trained Diffusion Transformers (DiTs) without starting from scratch [1][5][15] - Model architecture design is crucial in machine learning, defining model functions, operator selections, and configuration settings [2] - The high cost of training models from scratch presents challenges in researching new architectures, particularly for generative models [3][4] Grafting Process - The grafting process consists of two main stages: 1. Activation Distillation: This stage transfers the functionality of original operators to new operators through regression objectives [6] 2. Lightweight Fine-tuning: This stage uses limited data for tuning to mitigate error propagation caused by integrating multiple new operators [7][18] Experimental Findings - A testing platform based on DiT-XL/2 was developed to study the impact of grafting on model quality [11] - Grafting led to the development of hybrid designs that replaced Softmax attention with gated convolutions, local attention, and linear attention, achieving good quality with less than 2% of pre-training computational resources [12][13] - A case study demonstrated that grafting could convert sequential Transformer modules into parallel modules, halving the model depth while achieving higher quality than other models of the same depth [14] Self-Grafting - Self-grafting is introduced as a control setup where existing operators are replaced with the same type but with randomly initialized weights, allowing for performance benchmarking [21] - The choice of regression objectives significantly affects performance, with specific objectives yielding better initialization quality [25][26][27] Experimental Results on MHA and MLP - The experiments showed that grafting is effective for constructing efficient hybrid architectures with good generative quality under smaller computational budgets [41] - The grafting model achieved a 1.43x speedup in real-time computation while maintaining minimal loss in generative quality [42][43]
视频生成1.3B碾压14B、图像生成直逼GPT-4o!港科&快手开源测试时扩展新范式
机器之心· 2025-06-10 03:58
论文第一作者为何浩然,香港科技大学二年级博士,他的研究方向包括强化学习、生成流模型(GFlowNets)以及具身智能,通讯作者为香港科技大学电子与计算 机工程系、计算机科学与工程系助理教授潘玲。 测试时扩展(Test-Time Scaling)极大提升了大语言模型的性能,涌现出了如 OpenAI o 系列模型和 DeepSeek R1 等众多爆款。那么,什么是视觉领域的 test-time scaling?又该如何定义? 为了回答这一问题,最近 香港科技大学 联合 快手可灵团队 推出 Evolutionary Search (EvoSearch) 方法,通过提高推理时的计算量来大幅提升模型的生成质 量,支持图像和视频生成,支持目前最先进的 diffusion-based 和 flow-based 模型。EvoSearch 无需训练,无需梯度更新,即可在一系列任务上取得显著最优效果, 并且表现出良好的 scaling up 能力、鲁棒性和泛化性。 随着测试时计算量提升,EvoSearch 表明 SD2.1 和 Flux.1-dev 也有潜力媲美甚至超过 GPT4o。对于视频生成,Wan 1.3B 也能超过 Wa ...
大模型是「躲在洞穴里」观察世界? 强化学习大佬「吹哨」提醒LLM致命缺点
机器之心· 2025-06-10 03:58
Core Viewpoint - The article discusses the disparity in success between language models (LLMs) and video models, questioning why LLMs can learn effectively from predicting the next token while video models struggle with next-frame predictions [1][5][21]. Group 1 - AI technology is rapidly evolving, leading to deeper reflections on the limits of AI capabilities and the similarities and differences between human brains and computers [2][3]. - Sergey Levine argues that current LLMs are merely indirect "scans" of human thought processes, suggesting that they do not replicate true human cognition but rather mimic it through reverse engineering [5][26]. - The success of LLMs raises questions about the current direction of Artificial General Intelligence (AGI) exploration, indicating a potential need for adjustment in research focus [8][10]. Group 2 - The article highlights that while LLMs have achieved significant success in simulating human intelligence, they still exhibit limitations that warrant fundamental questioning [17][19]. - The core algorithm of LLMs is relatively simple, primarily involving next-word prediction, which leads to speculation about whether this simplicity reflects a universal algorithm used by the human brain [18][24]. - Despite the potential of video models to provide richer information, they have not matched the cognitive capabilities of LLMs, which can handle complex reasoning tasks that video models cannot [21][30]. Group 3 - The article posits that LLMs may not learn about the world through direct observation but rather through analyzing human thought processes reflected in text, leading to a form of indirect learning [26][28]. - This indirect learning method allows LLMs to simulate certain cognitive functions without fully understanding the underlying learning algorithms that humans use [30][32]. - The implications for AI development suggest that while LLMs can imitate human cognitive skills, they may struggle with autonomous learning from real-world experiences, highlighting a gap in achieving true adaptability [36][38].
刚刚,苹果WWDC掀AI重构风暴!端侧模型全开放、AI版Siri却成最大「鸽」王
机器之心· 2025-06-09 23:49
Core Viewpoint - Apple has introduced significant updates at the WWDC, including a new naming convention for its operating systems and a focus on integrating artificial intelligence across its products [1][2][12]. Group 1: Operating System Updates - Apple has reformed its operating system naming convention, moving from version numbers to a yearly naming system, starting with iOS 26 and macOS Tahoe 26 [2][3]. - The new design language, "Liquid Glass," features a smooth, semi-transparent interface that enhances user experience [5][10]. - The design changes will apply to various elements such as buttons, sliders, and navigation bars, marking the largest software design overhaul since iOS 7 in 2013 [8][10]. Group 2: Artificial Intelligence Integration - AI is a central theme of this year's WWDC, with Apple showcasing plans to incorporate AI capabilities across its products, including real-time translation and intelligent search features [12][14]. - The "Apple Intelligence" will enhance user interactions, allowing for features like live translation in messaging and calls, and visual intelligence for content recognition [16][19]. - Developers will have access to a new framework for integrating AI capabilities into their apps, enabling offline functionality and privacy protection [31][34]. Group 3: Developer Tools and Support - Apple has released Xcode 26, which includes enhanced AI features and support for integrating ChatGPT, allowing developers to utilize AI models easily [37][38]. - The new version of Xcode aims to improve developer productivity with tools that assist in coding and project management [40]. Group 4: Siri and Market Reactions - Despite advancements in Apple Intelligence, there was no significant update to Siri, which has raised concerns among users and investors about Apple's AI capabilities compared to competitors [42][49]. - Following the WWDC announcements, Apple's stock experienced a decline of up to 2.5%, indicating market dissatisfaction with the perceived lack of groundbreaking AI developments [50][51].
无需SFT也不用RL,样本级推理优化神器SLOT来了,准确率轻松+10%
机器之心· 2025-06-09 08:03
Core Viewpoint - The article discusses the innovative SLOT (Sample-specific Language Model Optimization at Test-time) method developed by the West Lake University MAPLE lab, which allows language models to "temporarily learn" from specific prompts during inference, leading to significant performance improvements in complex tasks [1][2][10]. Group 1: Methodology - SLOT treats each input prompt as a "mini training data," enabling the model to better understand the specific question before generating an answer [2][10]. - The method is simple, requiring only the optimization of a lightweight parameter vector (delta) at the last layer of the model, with minimal computational overhead (only a 7.9% increase in inference time) [5][12]. - The optimization process involves minimizing cross-entropy loss using the prompt itself as training data, which allows for efficient adaptation without modifying the original model [12][19]. Group 2: Performance Improvements - The Qwen2.5-7B model achieved an accuracy increase from 57.54% to 66.19% on the GSM8K math reasoning task, a rise of 8.65 percentage points [7]. - The DeepSeek-R1-Distill-Llama-70B model reached a new record of 68.69% on the GPQA Diamond task, showcasing the effectiveness of SLOT across various models [7][21]. - In challenging tasks like AIME 2024, multiple models demonstrated improvements exceeding 10% [7][22]. Group 3: Broader Implications - SLOT has shown stable enhancements across different model sizes and types, from 1.5B to 70B parameters, indicating its broad applicability [18][20]. - The method encourages deeper reasoning by adjusting the probability distribution of output vocabulary, promoting thoughtful responses rather than superficial pattern matching [17][19]. - Unlike traditional fine-tuning methods, SLOT does not require extensive training data, complex sampling strategies, or significant computational resources, making it a more accessible option for improving model performance [18][19].
开启端侧长文本时代!面壁全新架构,让小钢炮最快提升220倍
机器之心· 2025-06-09 08:03
Core Viewpoint - The article discusses the significant advancements in edge language models, particularly highlighting the launch of MiniCPM 4.0 by the AI startup Mianbi Intelligent, which represents a transformative innovation in the field of AI [2][3]. Group 1: Model Performance and Innovations - MiniCPM 4.0 features the industry's first system-level context-sparse language model innovation, achieving a high sparsity of 5%, enabling long-text reasoning on edge devices [4][5]. - The model comes in two versions: 8B and 0.5B parameters, both of which set new performance benchmarks for edge models [5]. - MiniCPM 4.0-8B demonstrates a stable 5x speed increase in long-text reasoning compared to similar models, with a maximum acceleration of 220x in extreme scenarios [5][10]. - In 128K long-text scenarios, MiniCPM 4.0-8B requires only 1/4 of the cache storage space compared to Qwen3-8B [16]. Group 2: Technical Architecture and Efficiency - The model employs an efficient dual-frequency shifting mechanism that allows it to automatically switch attention modes based on task characteristics, optimizing performance for both long and short texts [13]. - MiniCPM 4.0 integrates a self-developed inference framework, CPM.cu, which combines sparsity, speculation, and quantization for efficient edge inference, achieving a 5x speed increase [31]. - The BitCPM quantization algorithm achieves state-of-the-art 4-bit quantization, maintaining excellent performance even after a 90% reduction in model size [32]. Group 3: Market Implications and Future Directions - The advancements in MiniCPM 4.0 are expected to lead to a wave of updates in AI edge models integrated into smartphones and automotive systems, indicating a potential overhaul of many applications [19]. - Mianbi Intelligent emphasizes its focus on application-oriented advantages, having adapted the model for major chip platforms like Intel, Qualcomm, and Huawei Ascend [18]. - The company plans to continue releasing more foundational models in the MiniCPM series and explore multimodal models, indicating a commitment to ongoing innovation in AI capabilities [51].