Workflow
自适应计算
icon
Search documents
Transformer危!谷歌MoR架构发布:内存减半推理速度还翻倍
量子位· 2025-07-17 09:03
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 超越 Transformer ,谷歌推出全新底层架构—— Mixture-of-Recursions (MoR) ,注意不是MoE,它能推理速度提高2倍,而KV内存直接减半! 而且All in One, 首次 在单一框架中实现,用同一组参数处理不同任务的同时,进行动态分配计算资源。 就像给LLM开了个双层增强buff,模型性能和效率全都要。 谷歌DeepMind联合KAIST AI、Mila人团队通过 统一参数共享 、 自适应递归深度 和 高效KV缓存 ,在保持大模型性能的同时降低计算和内 存成本,形成新的效率最优解。 不少网友甚至将它形容为 Transformer Killer 。 更有甚者表示,该架构的出现或许能代表,潜在空间推理也许将会成为下一个LLM突破所在。 Transformer的出现虽然带来了优秀的少样本泛化和推理能力,但随之而来庞大的计算和内存需求还是让训练和部署成为难题。 目前相关优化方法主要是参数共享和自适应计算,但往往只能二选一,无法同时兼顾。 于是研究人员提出了递归混合模型 MoR ,可以在单一递归Transformer中同时融合两 ...
首篇潜空间推理综述!模型思考不必依赖Token,带宽暴增2700+倍
量子位· 2025-07-16 01:49
Core Insights - The article presents a comprehensive overview of latent space reasoning, highlighting its potential to achieve over 2700 times the bandwidth of traditional explicit reasoning chains (CoT) [1][15]. Group 1: Overview of Latent Space Reasoning - Latent space reasoning is an emerging field that traces its origins to the 2019 ICLR paper "Universal Transformers" by researchers from the University of Amsterdam and Google Brain [7]. - The article introduces a unified framework for latent space reasoning, which is based on mechanical interpretability and connects with the internal operations of models [3][4]. - The framework aims to facilitate future explorations, such as investigating infinite-depth reasoning through diffusion models [4]. Group 2: Mechanisms of Latent Space Reasoning - Latent space reasoning employs latent chains of thought, which represent reasoning in a continuous internal form rather than discrete natural language tokens [13][14]. - This method significantly enhances bandwidth, with each token in explicit CoT being approximately 15 bits, while latent CoT operations in a 2560-dimensional FP16 space yield around 40960 bits per step [15]. - The reasoning process is not constrained by a limited vocabulary, allowing for richer expressive capabilities [16]. Group 3: Modes of Latent Space Reasoning - There are two primary modes of latent space reasoning: vertical cycles and horizontal cycles [19]. - Vertical cycles utilize activation-based methods to extend computational depth, allowing models to repeatedly process the same set of layers to enhance reasoning [20][21]. - Horizontal cycles focus on expanding the model's memory and reasoning capabilities over time, maintaining a compressed hidden state that aggregates information from multiple time steps [28][29]. Group 4: Depth and Reasoning Capacity - The relationship between layer depth and reasoning capability is critical, with studies indicating that the implicit reasoning chain ability of models is strictly limited by the number of layers [34][40]. - Sufficient layer depth is necessary to execute multi-hop reasoning tasks effectively, as insufficient layers can hinder the emergence of final reasoning results [36][41]. - Research has established that the achievable length of reasoning chains is linearly related to the number of layers, positioning layer depth as a primary bottleneck for latent reasoning capacity [45]. Group 5: Advanced Reasoning Paradigms - The concept of "infinite depth reasoning" is proposed, allowing AI to allocate unlimited "thinking time" to refine solutions without output length constraints [53]. - This can be achieved through spatial infinite reasoning, which utilizes text diffusion models, and temporal infinite reasoning, which equates longer sequences with more optimization iterations [54][57]. - The article discusses specific methods for implementing these advanced paradigms, emphasizing their potential to enhance latent space reasoning [58].