Core Insights - The article discusses the significant leap in reasoning capabilities of large models over the past two years, highlighting the advancements made by models like OpenAI's o series, DeepSeek-R1, and QwQ-32B in complex tasks such as mathematics and logic [1][2] - It emphasizes that the improvement in reasoning ability is not merely due to increased computational steps but rather stems from a complex, multi-agent-like interaction structure termed "society of thought," where models simulate internal dialogues among different roles to arrive at correct answers [2][3] Group 1: Reasoning Mechanisms - The research indicates that reasoning models exhibit higher diversity of perspectives compared to baseline models, activating a broader range of features related to personality and expertise during reasoning tasks [2][3] - Controlled reinforcement learning experiments show that even with reasoning accuracy as the only reward signal, base models spontaneously increase dialogic behaviors, suggesting that socialized thinking structures enhance exploration of solution spaces [3][4] Group 2: Dialogic Behaviors - The study identifies four types of dialogic behaviors in reasoning trajectories: question-answer sequences, perspective shifts, viewpoint conflicts, and viewpoint harmonization, which collectively enhance cognitive strategies [7][8] - The Gemini-2.5-Pro model's evaluations show high consistency with human scoring, indicating reliable identification of these dialogic behaviors [9][13] Group 3: Social Emotional Roles - The analysis categorizes social emotional roles in reasoning trajectories into 12 types, which are further summarized into four high-level categories, demonstrating a balanced interaction among roles rather than isolated usage [10][22] - The Jaccard index is used to measure the co-occurrence of roles, revealing that models like DeepSeek-R1 organize different roles in a more coordinated manner during reasoning processes [10][22] Group 4: Cognitive Behaviors - The study identifies four cognitive behaviors that influence reasoning accuracy, including information provision, information inquiry, positive emotional roles, and negative emotional roles [11][12] - The consistency of the Gemini-2.5-Pro model's evaluations with human scoring reinforces the reliability of these cognitive behavior classifications [13] Group 5: Experimental Findings - The findings demonstrate that even with similar reasoning trajectory lengths, models exhibit a higher frequency of dialogic behaviors and social emotional roles, particularly in complex tasks [16][23] - Experiments show that guiding dialogic features positively impacts reasoning accuracy, with a notable increase from 27.1% to 54.8% in a specific task when dialogic surprise features are positively reinforced [24][29] Group 6: Reinforcement Learning Insights - A self-taught reinforcement learning experiment indicates that dialogic structures can spontaneously emerge and accelerate the formation of reasoning strategies when only correct answers are rewarded [30]
DeepSeek-R1推理智能从哪儿来?谷歌新研究:模型内心多个角色吵翻了