Workflow
Circuit Sparsity模型
icon
Search documents
OpenAI突然开源新模型,99.9%的权重是0,新稀疏性方法代替MoE
3 6 Ke· 2025-12-15 03:29
Core Insights - The article discusses the open-source implementation of Circuit Sparsity technology, which aims to enhance the interpretability of large language models by introducing a sparse structure that allows for clearer understanding of internal decision-making processes [2][4]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a variant of large language models that enforces sparsity in internal connections, making the model's computation process more understandable and interpretable [4]. - This technology aims to address the "black box" issue of traditional dense Transformers, allowing for clearer insights into how AI makes decisions and reducing reliance on potentially misleading outputs [4][10]. Group 2: Comparison with MoE Models - The article suggests that the extreme sparsity and functional decoupling of Circuit Sparsity may threaten the current popularity of Mixture of Experts (MoE) models, which rely on a more coarse approximation of sparsity [5][12]. - MoE models face challenges such as feature flow fragmentation and knowledge redundancy, while Circuit Sparsity offers a more precise dissection of model mechanisms [12][14]. Group 3: Performance and Efficiency - Experimental data indicates that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining the same pre-training loss, allowing for precise tracking of logical steps [12]. - However, Circuit Sparsity currently has significant drawbacks, including extremely high computational costs, being 100 to 1000 times more demanding than traditional dense models [14]. Group 4: Future Directions - The research team plans to expand the technology to larger models to unlock more complex reasoning circuits, indicating that this is an early step in exploring AI interpretability [14][16]. - Two potential methods to overcome the training efficiency issues of sparse models are identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [16].
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE
量子位· 2025-12-14 05:17
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 破解AI胡说八道的关键,居然是给大模型砍断99.9%的连接线? 也就是 Circuit Sparsity 技术的开源实现。 这是一种通过人为约束模型内部连接的稀疏性,让模型计算过程可拆解、可理解的大语言模型变体,本质上是为了解决传统稠密Transformer 的黑箱问题,让内部的计算电路能被人类清晰解读,知道AI是如何做决策的,避免轻易相信AI的胡话(doge)。 OpenAI悄悄开源新模型,仅有0.4B参数,且99.9%的权重为零。 更有人直言这种「极致稀疏+功能解耦」的思路可能会让当下热门的MoE(混合专家模型)走上末路。 那么,当Transformer的权重被训练到近乎全0,会发生什么呢? 放弃粗糙近似,追求原生稀疏 先说说为啥这个模型的思考过程能像电路图一样好懂。 咱们平时用的传统大模型,内部神经元连接得密密麻麻,权重矩阵几乎全为非零值,信息传递呈现出高度叠加状态,就像一团扯不开的乱线, 没人能说清它是怎么得出某个结论的。 这些留存的非零权重连接就像电路图里的导线,信息只能沿着固定路径传递;同时,模型还会通过 均值屏蔽 剪枝方法,为每个任务拆出专属 ...