Seek .-DeepSeek-R1推理智能从哪儿来？谷歌新研究：模型内心多个角色吵翻了

Core Insights - The reasoning capabilities of large models have significantly improved over the past two years, particularly in complex tasks involving mathematics, logic, and multi-step planning, with models like OpenAI's o series, DeepSeek-R1, and QwQ-32B showing a clear advantage over traditional instruction-tuned models [1] - Recent research indicates that the enhancement in reasoning abilities is not merely due to increased computational steps but rather stems from models simulating a complex, multi-agent interaction structure during reasoning, referred to as a "society of thought" [2] Group 1: Reasoning Mechanisms - The study reveals that reasoning models like DeepSeek-R1 and QwQ-32B exhibit significantly higher diversity in perspectives compared to baseline and instruction-tuned models, activating a broader range of features related to personality and expertise, leading to more substantial conflicts among these features [3] - The internal structure of these multi-agent-like interactions manifests through dialogic behaviors, including question-answer sequences, perspective shifts, and the integration of conflicting viewpoints, which collectively enhance cognitive strategies and explain the accuracy advantages in reasoning tasks [3][4] Group 2: Social Interaction and Cognitive Strategies - The findings suggest that the social organization of thought aids in more efficient exploration of solution spaces, with Google proposing a new research direction that systematically leverages "collective intelligence" through agent organization [4] - Controlled reinforcement learning experiments indicate that even when accuracy is the sole reward signal, base models spontaneously increase dialogic behaviors, and introducing conversational scaffolding significantly accelerates the enhancement of reasoning capabilities compared to non-tuned base models [3][4] Group 3: Dialogic Behaviors and Emotional Roles - The study identifies four types of dialogic behaviors in reasoning trajectories, including question-answer sequences, perspective shifts, viewpoint conflicts, and viewpoint reconciliation, which are crucial for enhancing reasoning accuracy [10][11] - Analysis of social emotional roles within reasoning trajectories shows that models like DeepSeek-R1 engage in more reciprocal emotional role structures, demonstrating both positive and negative emotional interactions, unlike instruction-tuned models that primarily exhibit unidirectional guidance [16][17] Group 4: Experimental Results and Implications - The results confirm that even with similar reasoning trajectory lengths, reasoning models display a higher frequency of dialogic behaviors and social emotional roles, particularly in complex tasks, indicating that dialogic features enhance reasoning performance [13][18] - Experiments show that positively guiding dialogic features can nearly double the accuracy in reasoning tasks, while negative guidance significantly suppresses these behaviors, highlighting the importance of dialogic interactions in effective problem-solving [18][20]