Workflow
Jamba
icon
Search documents
斯坦福最新论文,揭秘大语言模型心智理论的基础
3 6 Ke· 2025-09-24 11:04
Core Insights - The article discusses how AI, specifically large language models (LLMs), are beginning to exhibit "Theory of Mind" (ToM) capabilities, traditionally considered unique to humans [2][5] - A recent study from Stanford University reveals that the ability for complex social reasoning in these models is concentrated in a mere 0.001% of their total parameters, challenging previous assumptions about the distribution of cognitive abilities in neural networks [8][21] - The research highlights the importance of structured order and understanding of sequence in language processing as foundational to the emergence of advanced cognitive abilities in AI [15][20] Group 1: Theory of Mind in AI - The concept of "Theory of Mind" refers to the ability to understand others' thoughts, intentions, and beliefs, which is crucial for social interaction [2][3] - Recent benchmarks indicate that LLMs like Llama and Qwen can accurately respond to tests designed to evaluate ToM, suggesting they can simulate perspectives and understand information gaps [5][6] Group 2: Key Findings from the Stanford Study - The study identifies that the parameters driving ToM capabilities are highly concentrated, contradicting the belief that such abilities are widely distributed across the model [8][9] - The research utilized a sensitivity analysis method based on the Hessian matrix to pinpoint the parameters responsible for ToM, revealing a "mind core" that is critical for social reasoning [7][8] Group 3: Mechanisms Behind Cognitive Abilities - The findings suggest that the attention mechanism in models, particularly those using RoPE (Rotary Positional Encoding), is directly linked to their social reasoning capabilities [9][14] - Disrupting the identified "mind core" parameters in models using RoPE leads to a collapse of their ToM abilities, while models not using RoPE show resilience [8][14] Group 4: Emergence of Intelligence - The study posits that advanced cognitive abilities in AI emerge from a foundational understanding of sequence and structure in language, which is essential for higher-level reasoning [15][20] - The emergence of ToM is seen as a byproduct of mastering basic language structures and statistical patterns in human language, rather than a standalone cognitive module [20][23]
3700 次预训练寻找 “线性注意力” 非共识,MiniMax-01 开发者讲述 4 年探索
晚点LatePost· 2025-03-09 12:00
"我们跑的是下半场,赌的就是未来的长文本需求。" MiniMax 在今年 1 月发布了参数为 4560 亿的开源大模型 MiniMax-01,该模型就用到了他们开发的线 性注意力机制 "Lightning Attention"。 我们邀请了这个项目的负责人,MiniMax 高级研究总监钟怡然,来与我们一起聊线性注意力的研发过 程。钟怡然在 MiniMax 负责大模型网络架构设计,目前正开发多模态深度推理模型。 钟怡然曾担任上海人工智能实验室青年科学家,是新架构探索组的 PI(项目负责人);他在澳洲国立大 学获得博士学位,师从李宏东教授和 Richard Hartley 院士。他和他的团队已在一些国际顶级学术会议和 期刊上发表了 20 余篇关于模型新架构的论文,覆盖了当前多类非 Transformer 架构,如线性注意力机制 (线性注意力)、长卷积(Long Convolution)和线性循环网络(Linear RNN)。 在 2021 年,线性注意力还是一个 "看起来很美好的泡泡",怡然和团队就开始探索线性架构的实现。 嘉宾 丨 钟怡然 整理 丨 刘倩 程曼祺 上期播客中, 我们与清华的两位博士生,肖朝军和傅 ...