因果推理能力
Search documents
速度提升,能力却暴跌?扩散模型做智能体的残酷真相
机器之心· 2026-02-12 04:00
基于自回归语言模型的智能体已在许多场景中展现出完成复杂任务的能力,但高昂的推理成本和低下的执行效率问题仍然是制约智能体工作流(Agentic Workflow)发展的关键瓶颈。 与传统的自回归式语言模型不同,扩散语言模型(Diffusion-Based Language Models)采用并行解码机制,显著提升了生成速度,似乎为突破这一瓶颈 带来了全新的可能性。 现有的关于 Llada、Dream 等扩散语言模型的研究中,这类模型在大幅度提高生成效率的同时,在 MMLU、GSM8K 等基准任务上保持了与自回归语言模 型相当的通用能力。然而其在智能体任务上的表现尚缺乏系统性的评估。 这项工作揭示了一个深刻的教训(Bitter Lesson):尽管扩散语言模型实现了高效的并行推理,但也显著 削弱了其因果推理和反思能力 ,难以可靠地执行 具身智能体的长链推理任务;同时,并行解码机制使得输出具有 更高的不确定性 ,这对于精确性要求极高的工具调用任务造成了重大挑战。 论文标题:The Bitter Lesson of Diffusion Language Models for Agentic Workflows: AC ...
DeepSeek发布DeepSeek-OCR 2 让AI学会“人类视觉逻辑”
Zhi Tong Cai Jing· 2026-01-27 07:53
Core Insights - DeepSeek has launched the new DeepSeek-OCR2 model, which utilizes the innovative DeepEncoder V2 method to dynamically rearrange image components based on their meaning, enhancing visual understanding beyond traditional left-to-right scanning methods [1][2] - The model significantly outperforms traditional visual-language models (VLM) in processing complex layouts, achieving a score of 91.09% on the OmniDocBench v1.5 benchmark, which is a 3.73% improvement over its predecessor [1] Group 1 - The DeepSeek-OCR2 model maintains high accuracy while controlling computational costs, with visual token counts limited between 256 and 1120, aligning with Google’s Gemini-3Pro [2] - In practical applications, the model shows a reduction in repetition rates of 2.08% for online user logs and 0.81% for PDF pre-training data, indicating high practical maturity [2] Group 2 - The release of DeepSeek-OCR2 represents not only an upgrade in OCR performance but also significant architectural exploration, validating the potential of using language model architectures as visual encoders [2] - The DeepEncoder V2 architecture inherits advancements from the LLM community, such as mixture of experts (MoE) architecture and efficient attention mechanisms [2]