Workflow
MoE(混合专家模型)
icon
Search documents
对话张津剑:4年前没人相信AGI,MiniMax如今活出3000亿
投中网· 2026-02-26 01:57
将投中网设为"星标⭐",第一时间收获最新推送 人们夸他是"投AI最猛的人",字里行间很难说清这和"唐吉坷德挑战风车"有什么本质上的区别。 作者丨蒲凡 来源丨 投中网 2019年10月9日,绿洲资本成立。 张津剑朋友圈说取意十全九美,结果第一笔钱还没到账,疫情先到了。有绿洲早期投资人回忆,2020年情人节那天,绿洲写了一封邮件,做了不少数据 分析,认为这次疫情会深远地影响后续发展。投资人看完觉得可能是绿洲刚成立,有点多虑了,给张津剑通了个电话安慰他。张笑着说:"要做最坏的打 算。不过这也是战争迷雾,是绿洲最好的登陆方式"。 正是基于最坏打算的投资策略,绿洲第二支美元基金募集特别顺利,据说所有老投资人都增加了认购。2022年春,手握3亿美金的张津剑正打算大干一 把,就被通知居家隔离,成为了第一批关注对象。 让人更焦虑的事情在于,"大放水"结束了。2022年初,在通胀高企等一系列复杂因素的压力下,美元进入加息周期,IPO市场陷入停滞,掐断了独角兽 们的估值上涨空间和退出路径。风险投资也从一本万利的"投资艺术"快速回归到充满不确定性的"另类投资"里,快速失去市场信任。这一年,"Down round"融资占比从8%上涨 ...
OpenAI突然开源新模型,99.9%的权重是0,新稀疏性方法代替MoE
3 6 Ke· 2025-12-15 03:29
Core Insights - The article discusses the open-source implementation of Circuit Sparsity technology, which aims to enhance the interpretability of large language models by introducing a sparse structure that allows for clearer understanding of internal decision-making processes [2][4]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a variant of large language models that enforces sparsity in internal connections, making the model's computation process more understandable and interpretable [4]. - This technology aims to address the "black box" issue of traditional dense Transformers, allowing for clearer insights into how AI makes decisions and reducing reliance on potentially misleading outputs [4][10]. Group 2: Comparison with MoE Models - The article suggests that the extreme sparsity and functional decoupling of Circuit Sparsity may threaten the current popularity of Mixture of Experts (MoE) models, which rely on a more coarse approximation of sparsity [5][12]. - MoE models face challenges such as feature flow fragmentation and knowledge redundancy, while Circuit Sparsity offers a more precise dissection of model mechanisms [12][14]. Group 3: Performance and Efficiency - Experimental data indicates that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining the same pre-training loss, allowing for precise tracking of logical steps [12]. - However, Circuit Sparsity currently has significant drawbacks, including extremely high computational costs, being 100 to 1000 times more demanding than traditional dense models [14]. Group 4: Future Directions - The research team plans to expand the technology to larger models to unlock more complex reasoning circuits, indicating that this is an early step in exploring AI interpretability [14][16]. - Two potential methods to overcome the training efficiency issues of sparse models are identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [16].
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE
量子位· 2025-12-14 05:17
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 破解AI胡说八道的关键,居然是给大模型砍断99.9%的连接线? 也就是 Circuit Sparsity 技术的开源实现。 这是一种通过人为约束模型内部连接的稀疏性,让模型计算过程可拆解、可理解的大语言模型变体,本质上是为了解决传统稠密Transformer 的黑箱问题,让内部的计算电路能被人类清晰解读,知道AI是如何做决策的,避免轻易相信AI的胡话(doge)。 OpenAI悄悄开源新模型,仅有0.4B参数,且99.9%的权重为零。 更有人直言这种「极致稀疏+功能解耦」的思路可能会让当下热门的MoE(混合专家模型)走上末路。 那么,当Transformer的权重被训练到近乎全0,会发生什么呢? 放弃粗糙近似,追求原生稀疏 先说说为啥这个模型的思考过程能像电路图一样好懂。 咱们平时用的传统大模型,内部神经元连接得密密麻麻,权重矩阵几乎全为非零值,信息传递呈现出高度叠加状态,就像一团扯不开的乱线, 没人能说清它是怎么得出某个结论的。 这些留存的非零权重连接就像电路图里的导线,信息只能沿着固定路径传递;同时,模型还会通过 均值屏蔽 剪枝方法,为每个任务拆出专属 ...