Workflow
Jamba
icon
Search documents
英伟达拟以最高30亿美元收购AI21 Labs,加速布局大模型与AI代理生态
Huan Qiu Wang Zi Xun· 2025-12-31 04:12
据介绍,Maestro是AI21的商业化支柱,年化收入约5000万美元。该平台帮助企业预处理结构化与非结 构化数据,优化输入以供AI代理高效分析,并对生成结果进行准确性验证与格式化输出。 知情人士透露,英伟达有意将Maestro整合进其企业级AI软件套件 NVIDIA AI Enterprise,以增强其在AI 代理开发与部署领域的端到端能力。此举将进一步丰富该套件中的预训练模型、开发工具及工作流管理 功能,强化英伟达在生成式AI基础设施市场的领导地位。 此次潜在收购紧随英伟达另一项重大技术合作之后——近日,该公司宣布以200亿美元获得AI芯片初创 公司Groq的技术授权,并吸纳其创始CEO及核心团队加入。Groq专注于高性能推理处理器,其技术有 望与英伟达现有GPU架构形成互补。 外媒称,AI21 Labs曾于2023年完成一轮由英伟达、谷歌、三星电子等共同参与的融资,并在今年早些 时候低调完成3亿美元新融资,估值与前轮基本持平。尽管此前已有投资关系,此次拟议收购表明英伟 达正从"财务支持者"转向"战略整合者",意图通过垂直整合关键AI软件能力,巩固其在大模型时代软硬 协同的生态优势。(青云) 来源:环球网 ...
斯坦福最新论文,揭秘大语言模型心智理论的基础
3 6 Ke· 2025-09-24 11:04
Core Insights - The article discusses how AI, specifically large language models (LLMs), are beginning to exhibit "Theory of Mind" (ToM) capabilities, traditionally considered unique to humans [2][5] - A recent study from Stanford University reveals that the ability for complex social reasoning in these models is concentrated in a mere 0.001% of their total parameters, challenging previous assumptions about the distribution of cognitive abilities in neural networks [8][21] - The research highlights the importance of structured order and understanding of sequence in language processing as foundational to the emergence of advanced cognitive abilities in AI [15][20] Group 1: Theory of Mind in AI - The concept of "Theory of Mind" refers to the ability to understand others' thoughts, intentions, and beliefs, which is crucial for social interaction [2][3] - Recent benchmarks indicate that LLMs like Llama and Qwen can accurately respond to tests designed to evaluate ToM, suggesting they can simulate perspectives and understand information gaps [5][6] Group 2: Key Findings from the Stanford Study - The study identifies that the parameters driving ToM capabilities are highly concentrated, contradicting the belief that such abilities are widely distributed across the model [8][9] - The research utilized a sensitivity analysis method based on the Hessian matrix to pinpoint the parameters responsible for ToM, revealing a "mind core" that is critical for social reasoning [7][8] Group 3: Mechanisms Behind Cognitive Abilities - The findings suggest that the attention mechanism in models, particularly those using RoPE (Rotary Positional Encoding), is directly linked to their social reasoning capabilities [9][14] - Disrupting the identified "mind core" parameters in models using RoPE leads to a collapse of their ToM abilities, while models not using RoPE show resilience [8][14] Group 4: Emergence of Intelligence - The study posits that advanced cognitive abilities in AI emerge from a foundational understanding of sequence and structure in language, which is essential for higher-level reasoning [15][20] - The emergence of ToM is seen as a byproduct of mastering basic language structures and statistical patterns in human language, rather than a standalone cognitive module [20][23]
3700 次预训练寻找 “线性注意力” 非共识,MiniMax-01 开发者讲述 4 年探索
晚点LatePost· 2025-03-09 12:00
"我们跑的是下半场,赌的就是未来的长文本需求。" MiniMax 在今年 1 月发布了参数为 4560 亿的开源大模型 MiniMax-01,该模型就用到了他们开发的线 性注意力机制 "Lightning Attention"。 我们邀请了这个项目的负责人,MiniMax 高级研究总监钟怡然,来与我们一起聊线性注意力的研发过 程。钟怡然在 MiniMax 负责大模型网络架构设计,目前正开发多模态深度推理模型。 钟怡然曾担任上海人工智能实验室青年科学家,是新架构探索组的 PI(项目负责人);他在澳洲国立大 学获得博士学位,师从李宏东教授和 Richard Hartley 院士。他和他的团队已在一些国际顶级学术会议和 期刊上发表了 20 余篇关于模型新架构的论文,覆盖了当前多类非 Transformer 架构,如线性注意力机制 (线性注意力)、长卷积(Long Convolution)和线性循环网络(Linear RNN)。 在 2021 年,线性注意力还是一个 "看起来很美好的泡泡",怡然和团队就开始探索线性架构的实现。 嘉宾 丨 钟怡然 整理 丨 刘倩 程曼祺 上期播客中, 我们与清华的两位博士生,肖朝军和傅 ...