Circuit Sparsity模型 - filings, earnings calls, financial reports, news

Circuit Sparsity模型

Search documents

3 6 Ke· 2025-12-15 03:29

Core Insights - The article discusses the open-source implementation of Circuit Sparsity technology, which aims to enhance the interpretability of large language models by introducing a sparse structure that allows for clearer understanding of internal decision-making processes [2][4]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a variant of large language models that enforces sparsity in internal connections, making the model's computation process more understandable and interpretable [4]. - This technology aims to address the "black box" issue of traditional dense Transformers, allowing for clearer insights into how AI makes decisions and reducing reliance on potentially misleading outputs [4][10]. Group 2: Comparison with MoE Models - The article suggests that the extreme sparsity and functional decoupling of Circuit Sparsity may threaten the current popularity of Mixture of Experts (MoE) models, which rely on a more coarse approximation of sparsity [5][12]. - MoE models face challenges such as feature flow fragmentation and knowledge redundancy, while Circuit Sparsity offers a more precise dissection of model mechanisms [12][14]. Group 3: Performance and Efficiency - Experimental data indicates that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining the same pre-training loss, allowing for precise tracking of logical steps [12]. - However, Circuit Sparsity currently has significant drawbacks, including extremely high computational costs, being 100 to 1000 times more demanding than traditional dense models [14]. Group 4: Future Directions - The research team plans to expand the technology to larger models to unlock more complex reasoning circuits, indicating that this is an early step in exploring AI interpretability [14][16]. - Two potential methods to overcome the training efficiency issues of sparse models are identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [16].

AI可解释性

大语言模型稀疏性

Artificial Intelligence

Artificial Intelligence

Circuit Sparsity模型

MoE（混合专家模型）

OpenAI突然开源新模型！99.9%的权重是0，新稀疏性方法代替MoE

量子位· 2025-12-14 05:17

闻乐发自凹非寺量子位 | 公众号 QbitAI 破解AI胡说八道的关键，居然是给大模型砍断99.9%的连接线？也就是 Circuit Sparsity 技术的开源实现。这是一种通过人为约束模型内部连接的稀疏性，让模型计算过程可拆解、可理解的大语言模型变体，本质上是为了解决传统稠密Transformer 的黑箱问题，让内部的计算电路能被人类清晰解读，知道AI是如何做决策的，避免轻易相信AI的胡话（doge）。 OpenAI悄悄开源新模型，仅有0.4B参数，且99.9%的权重为零。更有人直言这种「极致稀疏+功能解耦」的思路可能会让当下热门的MoE（混合专家模型）走上末路。那么，当Transformer的权重被训练到近乎全0，会发生什么呢？放弃粗糙近似，追求原生稀疏先说说为啥这个模型的思考过程能像电路图一样好懂。咱们平时用的传统大模型，内部神经元连接得密密麻麻，权重矩阵几乎全为非零值，信息传递呈现出高度叠加状态，就像一团扯不开的乱线，没人能说清它是怎么得出某个结论的。这些留存的非零权重连接就像电路图里的导线，信息只能沿着固定路径传递；同时，模型还会通过均值屏蔽剪枝方法，为每个任务拆出专属 ...

AI可解释性

原生稀疏性

Artificial Intelligence

Artificial Intelligence

Circuit Sparsity模型

MoE（混合专家模型）