Workflow
思维社会
icon
Search documents
DeepSeek-R1推理智能从哪儿来?谷歌新研究:模型内心多个角色吵翻了
3 6 Ke· 2026-01-26 09:14
但如果把问题继续往深处追问:推理能力的本质,真的只是多算几步吗? 谷歌、芝加哥大学等机构的研究者最近发表的一篇论文给出了一个更具结构性的答案,推理能力的提升并非仅源于计算步数的增加,而是来自模型在推理 过程中隐式模拟了一种复杂的、类多智能体的交互结构,他们称之为「思维社会」(society of thought)。 过去两年,大模型的推理能力出现了一次明显的跃迁。在数学、逻辑、多步规划等复杂任务上,推理模型如 OpenAI 的 o 系列、DeepSeek-R1、QwQ- 32B,开始稳定拉开与传统指令微调模型的差距。直观来看,它们似乎只是思考得更久了:更长的 Chain-of-Thought、更高的 test-time compute,成为最常 被引用的解释。 简单理解就是,这项研究发现,为了解决难题,推理模型有时会模拟不同角色之间的内部对话,就像他们数字大脑中的辩论队一样。他们争论、纠正对 方、表达惊讶,并调和不同观点以达成正确答案。人类智能很可能是因为社交互动而进化的,而类似的直觉似乎也适用于人工智能! 通过对推理输出进行分类,以及结合作用于推理轨迹的机制可解释性方法,研究发现,诸如 DeepSeek-R ...
DeepSeek-R1推理智能从哪儿来?谷歌新研究:模型内心多个角色吵翻了
机器之心· 2026-01-26 04:08
Core Insights - The article discusses the significant leap in reasoning capabilities of large models over the past two years, highlighting the advancements made by models like OpenAI's o series, DeepSeek-R1, and QwQ-32B in complex tasks such as mathematics and logic [1][2] - It emphasizes that the improvement in reasoning ability is not merely due to increased computational steps but rather stems from a complex, multi-agent-like interaction structure termed "society of thought," where models simulate internal dialogues among different roles to arrive at correct answers [2][3] Group 1: Reasoning Mechanisms - The research indicates that reasoning models exhibit higher diversity of perspectives compared to baseline models, activating a broader range of features related to personality and expertise during reasoning tasks [2][3] - Controlled reinforcement learning experiments show that even with reasoning accuracy as the only reward signal, base models spontaneously increase dialogic behaviors, suggesting that socialized thinking structures enhance exploration of solution spaces [3][4] Group 2: Dialogic Behaviors - The study identifies four types of dialogic behaviors in reasoning trajectories: question-answer sequences, perspective shifts, viewpoint conflicts, and viewpoint harmonization, which collectively enhance cognitive strategies [7][8] - The Gemini-2.5-Pro model's evaluations show high consistency with human scoring, indicating reliable identification of these dialogic behaviors [9][13] Group 3: Social Emotional Roles - The analysis categorizes social emotional roles in reasoning trajectories into 12 types, which are further summarized into four high-level categories, demonstrating a balanced interaction among roles rather than isolated usage [10][22] - The Jaccard index is used to measure the co-occurrence of roles, revealing that models like DeepSeek-R1 organize different roles in a more coordinated manner during reasoning processes [10][22] Group 4: Cognitive Behaviors - The study identifies four cognitive behaviors that influence reasoning accuracy, including information provision, information inquiry, positive emotional roles, and negative emotional roles [11][12] - The consistency of the Gemini-2.5-Pro model's evaluations with human scoring reinforces the reliability of these cognitive behavior classifications [13] Group 5: Experimental Findings - The findings demonstrate that even with similar reasoning trajectory lengths, models exhibit a higher frequency of dialogic behaviors and social emotional roles, particularly in complex tasks [16][23] - Experiments show that guiding dialogic features positively impacts reasoning accuracy, with a notable increase from 27.1% to 54.8% in a specific task when dialogic surprise features are positively reinforced [24][29] Group 6: Reinforcement Learning Insights - A self-taught reinforcement learning experiment indicates that dialogic structures can spontaneously emerge and accelerate the formation of reasoning strategies when only correct answers are rewarded [30]