系统2思维

Search documents
大历史中的超能力|荐书
腾讯研究院· 2025-07-18 08:18
Core Viewpoint - The article discusses the evolution of intelligence from early mammals to modern AI, emphasizing that intelligence can compensate for physical limitations and that historical events significantly influence the development of intelligence [3][4][11]. Group 1: Evolution of Intelligence - The first breakthrough in brain evolution occurred 550 million years ago, allowing organisms to differentiate between stimuli and develop basic emotional responses with only a few hundred neurons [4]. - The second breakthrough involved the advanced use of dopamine in vertebrates, enabling them to quantify the likelihood of rewards and develop curiosity through complex actions [5]. - The third breakthrough was the development of the neocortex in mammals, which allowed for imagination and planning, akin to slow thinking as described by Daniel Kahneman [5][6]. Group 2: AI and Intelligence - AI has significantly improved through reinforcement learning, which rewards processes rather than just outcomes, allowing for learning from each step rather than waiting for the end result [5]. - Current AI models, particularly large language models, demonstrate an understanding of language beyond mere memorization, indicating a significant advancement in AI capabilities [7][10]. - The potential future breakthroughs in AI may involve combining human and AI intelligence, enabling AI to simulate multiple worlds or understand complex rules in novel ways [11][12]. Group 3: Historical Context of Breakthroughs - Historical events, such as the asteroid impact that led to the extinction of dinosaurs, have provided opportunities for the evolution of mammals and the development of intelligence [3][15]. - The article suggests that significant changes in the world often arise from unexpected and radical shifts rather than gradual improvements [16][17].
新范式来了!新能量模型打破Transformer++扩展上限,训练扩展率快35%
机器之心· 2025-07-07 04:48
Core Insights - The article discusses the development of Energy-Based Transformers (EBTs) that can learn to think independently through unsupervised learning, enhancing the model's reasoning capabilities akin to human System 2 thinking [9][10]. Group 1: System 2 Thinking and Model Development - Human thinking is categorized into System 1 (fast thinking) and System 2 (slow thinking), with the latter being crucial for complex tasks [3][4]. - Current large language models excel in System 1 tasks but struggle with System 2 tasks, prompting researchers to explore methods to enhance System 2 reasoning [4][5]. - EBTs are designed to assign energy values to input and candidate predictions, optimizing through gradient descent to simulate a thinking process [9][10]. Group 2: Performance and Scalability - EBTs demonstrate a 35% faster scalability in training compared to mainstream Transformer++ methods across various metrics such as data volume and model depth [11]. - In reasoning tasks, EBTs outperform Transformer++ by 29% in language tasks, indicating superior performance with increased computational effort [12]. - EBTs also excel in image denoising tasks, requiring fewer forward passes than diffusion Transformers while achieving better results [13]. Group 3: Generalization and Robustness - EBTs show enhanced generalization capabilities, particularly when handling out-of-distribution data, outperforming existing models even with similar or worse pre-training performance [14]. - The model's ability to learn and express uncertainty in predictions is highlighted, with EBTs effectively capturing the difficulty of token predictions [62][65]. - EBTs exhibit a linear trend in performance improvement as the distribution shift increases, emphasizing their critical role in cross-distribution generalization tasks [68][69]. Group 4: Experimental Results and Comparisons - EBTs outperform Transformer++ in various scalability metrics, including data efficiency and computational efficiency, suggesting they will excel in large-scale training scenarios [46][72]. - Despite slightly higher pre-training perplexity, EBTs achieve lower perplexity in downstream tasks, indicating stronger generalization capabilities [74]. - In image denoising tasks, EBTs significantly outperform DiT models, achieving better peak signal-to-noise ratios (PSNR) with 99% fewer forward passes [81][92].