自注意力机制
Search documents
人类画了100年的脑图,AI仅用几小时!还绘制出新脑区
量子位· 2026-02-10 11:59
听雨 发自 凹非寺 量子位 | 公众号 QbitAI 好消息,AI也可以帮科学家画脑图了! 近期,一个来自加州大学旧金山分校的神经科学团队提出了一种新的机器学习算法—— CellTransformer ,仅 花费几个小时 就完成了对5 只小鼠大脑图谱的分类和绘制工作。 这五只小鼠大脑的基因数据中包含1040万个细胞,每个细胞包含数百个基因。但通过这一创新性算法,研究团队不仅清晰地划分出了小鼠大 脑内的已知区域,还 绘制出了新的脑区 。 更夸张的是,这项技术很可能会进一步 应用于人类 。 画脑图的最新黑科技:CellTransformer 大脑图谱绘制是一门古老的学科,过去画脑图的方法相当复杂,需要科学家用铅笔在脑部图像上画线,连接不同区域。 2020年发布的艾伦小鼠脑通用坐标框架 (Allen Mouse Brain Common Coordinate Framework), 就是采用这种方法画出来的。 这幅脑图基于1675只小鼠的脑部数据,涵盖了1000多个不同的脑区,具有很高的价值。 但这类手工特征很强的图谱也不可避免地存在一个问题: 具有主观性 。 宾夕法尼亚州立大学医学院的神经解剖学家金永洙 (Yon ...
时空压缩!剑桥大学提出注意力机制MTLA:推理加速5倍,显存减至1/8
机器之心· 2025-06-11 00:24
Core Insights - The article discusses the significance of the Transformer architecture in the context of large language models, emphasizing its irreplaceable role despite challenges related to computational complexity and efficiency [1][2][5]. Group 1: Transformer Architecture and Challenges - The self-attention mechanism of the Transformer, while powerful in modeling long-range dependencies, faces challenges due to its quadratic computational complexity, which has led to research on alternatives [1]. - The KV cache size grows linearly with the sequence length during inference, becoming a critical bottleneck for efficiency as model parameters increase [1][2]. Group 2: Innovations in KV Cache Management - The MLA mechanism proposed by the DeepSeek team compresses the KV cache in the latent space, significantly improving inference efficiency, especially in low-resource scenarios [2][7]. - The introduction of Multi-head Temporal Latent Attention (MTLA) combines temporal and latent space compression, addressing the redundancy in the KV cache as sequence lengths increase [2][9]. Group 3: Comparison of Attention Mechanisms - Current models often use Grouped-Query Attention (GQA) to reduce KV cache size by grouping query heads, achieving a balance between efficiency and performance [5]. - MTLA outperforms existing methods like GQA and MQA by maintaining model performance while compressing both spatial and temporal dimensions of the KV cache [9][20]. Group 4: Performance and Future Potential - MTLA demonstrates superior performance across various tasks, achieving over 5 times faster inference speed and reducing GPU memory usage by more than 8 times compared to standard MHA [20]. - The potential for MTLA in large-scale deployments is significant, especially as the demand for efficient KV cache management grows with increasing model sizes and sequence lengths [23][24].
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].
英伟达,我命由天不由我
虎嗅APP· 2025-03-07 10:35
Core Viewpoint - The article discusses the journey of NVIDIA and its CEO Jensen Huang, highlighting the challenges faced and the strategies employed to achieve success in the competitive GPU market, particularly in the context of AI advancements. Group 1: NVIDIA's Financial Performance - After the release of its financial report, NVIDIA's stock dropped over 8% on two separate days, resulting in a market value loss equivalent to two Xiaomi companies, despite exceeding revenue expectations with a profit growth of 80% [3][4]. - NVIDIA's revenue is compared to four Moutai liquors, indicating substantial financial performance [3]. Group 2: Jensen Huang's Leadership Style - Jensen Huang is portrayed as a charismatic yet ruthless leader, known for his demanding management style, including public humiliation of employees during project failures [5][6]. - Huang's approach to competition includes aggressive tactics such as poaching talent from rivals and engaging in legal battles to undermine competitors [6][8]. Group 3: Strategic Decisions and Failures - NVIDIA faced significant challenges, including failed ventures in mobile devices and the acquisition of Icera, which did not yield expected returns [15][16]. - The company was pressured by Starboard Value to cut unprofitable projects, leading to a focus on profitable ventures and ultimately boosting stock prices [16][17]. Group 4: Resilience and Adaptation - Huang's ability to adapt to failures is emphasized, showcasing how he pivoted from unsuccessful strategies to capitalize on emerging opportunities, such as the development of the CUDA platform [18][20]. - The article highlights Huang's commitment to CUDA despite external pressures, recognizing its long-term potential for scientific applications [20][21]. Group 5: NVIDIA's Competitive Advantages - NVIDIA's current market dominance is attributed to three main advantages: strong GPU technology, the CUDA platform, and the acquisition of Mellanox, which enhances data throughput capabilities [25][26][27]. - The article notes that NVIDIA's foresight in AI and computing power has positioned it uniquely in the market, allowing it to outpace competitors [30][31].