Workflow
潜空间推理
icon
Search documents
首篇潜空间推理综述!模型思考不必依赖Token,带宽暴增2700+倍
量子位· 2025-07-16 01:49
Core Insights - The article presents a comprehensive overview of latent space reasoning, highlighting its potential to achieve over 2700 times the bandwidth of traditional explicit reasoning chains (CoT) [1][15]. Group 1: Overview of Latent Space Reasoning - Latent space reasoning is an emerging field that traces its origins to the 2019 ICLR paper "Universal Transformers" by researchers from the University of Amsterdam and Google Brain [7]. - The article introduces a unified framework for latent space reasoning, which is based on mechanical interpretability and connects with the internal operations of models [3][4]. - The framework aims to facilitate future explorations, such as investigating infinite-depth reasoning through diffusion models [4]. Group 2: Mechanisms of Latent Space Reasoning - Latent space reasoning employs latent chains of thought, which represent reasoning in a continuous internal form rather than discrete natural language tokens [13][14]. - This method significantly enhances bandwidth, with each token in explicit CoT being approximately 15 bits, while latent CoT operations in a 2560-dimensional FP16 space yield around 40960 bits per step [15]. - The reasoning process is not constrained by a limited vocabulary, allowing for richer expressive capabilities [16]. Group 3: Modes of Latent Space Reasoning - There are two primary modes of latent space reasoning: vertical cycles and horizontal cycles [19]. - Vertical cycles utilize activation-based methods to extend computational depth, allowing models to repeatedly process the same set of layers to enhance reasoning [20][21]. - Horizontal cycles focus on expanding the model's memory and reasoning capabilities over time, maintaining a compressed hidden state that aggregates information from multiple time steps [28][29]. Group 4: Depth and Reasoning Capacity - The relationship between layer depth and reasoning capability is critical, with studies indicating that the implicit reasoning chain ability of models is strictly limited by the number of layers [34][40]. - Sufficient layer depth is necessary to execute multi-hop reasoning tasks effectively, as insufficient layers can hinder the emergence of final reasoning results [36][41]. - Research has established that the achievable length of reasoning chains is linearly related to the number of layers, positioning layer depth as a primary bottleneck for latent reasoning capacity [45]. Group 5: Advanced Reasoning Paradigms - The concept of "infinite depth reasoning" is proposed, allowing AI to allocate unlimited "thinking time" to refine solutions without output length constraints [53]. - This can be achieved through spatial infinite reasoning, which utilizes text diffusion models, and temporal infinite reasoning, which equates longer sequences with more optimization iterations [54][57]. - The article discusses specific methods for implementing these advanced paradigms, emphasizing their potential to enhance latent space reasoning [58].
草稿链代替思维链,推理token砍掉80%,显著降低算力成本和延迟
量子位· 2025-03-10 03:29
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 推理token减少80%-90%,准确率变化不大,某些任务还能增加。 Zoom团队提出思维链替代品"草稿链",显著降低延迟和算力成本。 原理很简单,要求模型为每个推理步骤生成简洁、信息密集的token。 这一思路受到人类解题过程启发,人类通常不会详细阐述每一个细节,只是简单几下关键的中间结果,作为草稿来辅助思考。 此外,草稿链方法简单且易于实现,不需要修改模型、微调或强化学习,只需更新提示词中的示例即可,相关代码和数据已在GitHub上开 源。 研究团队认为,与另一种降低延迟和计算成本的方法"在连续潜空间推理"相比,草稿链保留了可解释性,且可以应用于闭源的黑盒模型。 第三方分析测算,对于每个月处理100万个推理请求的企业, 可以将成本从思维链的3800美元降低到760美元,每月节省超过3000美元。 实验遵循原始思维链论文,评估3类任务:算术推理、常识推理和符号推理。 算数推理任务选择GSM8k数据集,从准确率看,标准提示下GPT-4o和Claude 3.5 Sonnet的准确率分别仅53.3%和64.6%,思维链使两者均 超95%,草稿链也达到91%左右 ...