系统2思维

Search documents
大历史中的超能力|荐书
腾讯研究院· 2025-07-18 08:18
万维钢 科学作家,得到App精英日课专栏作者 到了距今大约1亿年前,犬齿兽的某一个分支的后代,演化成了哺乳动物。其他都不论,它有个最新的 特点是大脑中长出了新皮质。这给它带来了"超能力"。 你现在待在洞穴里,外面视线可及的范围内有个猎物,你很想出击。 爬行动物面对这种局面都是感觉不错就出手,万一中间遇到什么事儿就临时见招拆招,但我们的哺乳动 物祖先不会如此。它的"超能力"是想象和短期计划。 等会儿我冲刺,猎物会往哪个方向跑?我追过去会不会遭遇恐龙?如果恐龙来了我该往哪跑?我逃生的 机会大不大? 如果你能暂时摆脱忙碌的日常生活和钩心斗角的地缘政治,有时间想一点儿更大的事儿,我先给你讲个 故事。 地球上一切哺乳动物的祖先叫犬齿兽(cynodont),大约最早出现在距今2.6亿年前。它的身体就像今天 的老鼠一样小,力量和速度都一般,不可能跟当时地球的主宰恐龙争锋,所以生活很低调,平时只能躲 在洞穴里。 但犬齿兽也有自身的优势:它是温血动物。这意味着晚上气温比较低的时候,它仍然可以灵活行动。对 比之下,恐龙作为爬行动物,体温会和气温一起降低,身体随之变得僵硬而难以活动。所以犬齿兽拥有 夜晚。但温血的缺点也很明显,那就 ...
新范式来了!新能量模型打破Transformer++扩展上限,训练扩展率快35%
机器之心· 2025-07-07 04:48
Core Insights - The article discusses the development of Energy-Based Transformers (EBTs) that can learn to think independently through unsupervised learning, enhancing the model's reasoning capabilities akin to human System 2 thinking [9][10]. Group 1: System 2 Thinking and Model Development - Human thinking is categorized into System 1 (fast thinking) and System 2 (slow thinking), with the latter being crucial for complex tasks [3][4]. - Current large language models excel in System 1 tasks but struggle with System 2 tasks, prompting researchers to explore methods to enhance System 2 reasoning [4][5]. - EBTs are designed to assign energy values to input and candidate predictions, optimizing through gradient descent to simulate a thinking process [9][10]. Group 2: Performance and Scalability - EBTs demonstrate a 35% faster scalability in training compared to mainstream Transformer++ methods across various metrics such as data volume and model depth [11]. - In reasoning tasks, EBTs outperform Transformer++ by 29% in language tasks, indicating superior performance with increased computational effort [12]. - EBTs also excel in image denoising tasks, requiring fewer forward passes than diffusion Transformers while achieving better results [13]. Group 3: Generalization and Robustness - EBTs show enhanced generalization capabilities, particularly when handling out-of-distribution data, outperforming existing models even with similar or worse pre-training performance [14]. - The model's ability to learn and express uncertainty in predictions is highlighted, with EBTs effectively capturing the difficulty of token predictions [62][65]. - EBTs exhibit a linear trend in performance improvement as the distribution shift increases, emphasizing their critical role in cross-distribution generalization tasks [68][69]. Group 4: Experimental Results and Comparisons - EBTs outperform Transformer++ in various scalability metrics, including data efficiency and computational efficiency, suggesting they will excel in large-scale training scenarios [46][72]. - Despite slightly higher pre-training perplexity, EBTs achieve lower perplexity in downstream tasks, indicating stronger generalization capabilities [74]. - In image denoising tasks, EBTs significantly outperform DiT models, achieving better peak signal-to-noise ratios (PSNR) with 99% fewer forward passes [81][92].