MoE(混合专家模型)
Search documents
对话张津剑:4年前没人相信AGI,MiniMax如今活出3000亿
投中网· 2026-02-26 01:57
Core Viewpoint - The article discusses the journey of Oasis Capital, particularly focusing on its investment strategy in AI and the challenges faced during the pandemic and economic shifts. It highlights the importance of optimism, belief in innovation, and the role of young entrepreneurs in driving the AI sector forward [3][4][5]. Group 1: Investment Strategy and Market Conditions - Oasis Capital was founded in 2019, and its initial investment strategy was based on worst-case scenarios, which proved beneficial during the pandemic [3]. - In 2022, the venture capital landscape shifted dramatically due to rising inflation and interest rates, leading to a significant increase in "down round" financing from 8% to 20% [3][4]. - The firm decided to focus on AI as a core investment direction in November 2022, predicting the release of new AI models, which positioned them ahead of the curve [5][14]. Group 2: Key Investments and Achievements - In 2023, Oasis Capital completed investments in over 10 AI projects, capitalizing on the "GPT moment" despite skepticism about domestic AI models [7]. - The first IPO in the AI sector for Oasis Capital was MiniMax, which saw a 109% increase on its first trading day and reached a market cap of over 300 billion HKD shortly after [7][8]. - The firm has invested in various AI startups, including 千寻智能, Vast, and 逐际动力, showcasing a commitment to supporting innovative companies in the AI space [8][20]. Group 3: Entrepreneurial Insights and Philosophy - The article emphasizes the importance of optimism and belief in the potential of young entrepreneurs, particularly in the context of AI innovation [5][29]. - The founder of MiniMax, 闫俊杰, is highlighted for his focus and dedication, which resonated with Oasis Capital's investment philosophy of supporting passionate and innovative individuals [28][40]. - The narrative suggests that the essence of successful investment lies in the ability to believe in and support visionary entrepreneurs, even when the broader market is skeptical [25][32].
OpenAI突然开源新模型,99.9%的权重是0,新稀疏性方法代替MoE
3 6 Ke· 2025-12-15 03:29
Core Insights - The article discusses the open-source implementation of Circuit Sparsity technology, which aims to enhance the interpretability of large language models by introducing a sparse structure that allows for clearer understanding of internal decision-making processes [2][4]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a variant of large language models that enforces sparsity in internal connections, making the model's computation process more understandable and interpretable [4]. - This technology aims to address the "black box" issue of traditional dense Transformers, allowing for clearer insights into how AI makes decisions and reducing reliance on potentially misleading outputs [4][10]. Group 2: Comparison with MoE Models - The article suggests that the extreme sparsity and functional decoupling of Circuit Sparsity may threaten the current popularity of Mixture of Experts (MoE) models, which rely on a more coarse approximation of sparsity [5][12]. - MoE models face challenges such as feature flow fragmentation and knowledge redundancy, while Circuit Sparsity offers a more precise dissection of model mechanisms [12][14]. Group 3: Performance and Efficiency - Experimental data indicates that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining the same pre-training loss, allowing for precise tracking of logical steps [12]. - However, Circuit Sparsity currently has significant drawbacks, including extremely high computational costs, being 100 to 1000 times more demanding than traditional dense models [14]. Group 4: Future Directions - The research team plans to expand the technology to larger models to unlock more complex reasoning circuits, indicating that this is an early step in exploring AI interpretability [14][16]. - Two potential methods to overcome the training efficiency issues of sparse models are identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [16].
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE
量子位· 2025-12-14 05:17
Core Viewpoint - The article discusses the introduction of Circuit Sparsity technology, which allows for a significant reduction in the connections of large language models, making them more interpretable and efficient by retaining only 0.1% of the connections while achieving similar performance to traditional dense models [1][3][6]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a method that enforces sparsity in the internal connections of models, making the computation process more understandable and addressing the black-box nature of traditional dense Transformers [6][10]. - The model retains only 0.1% of its connections, allowing for a clear and traceable decision-making process, akin to a circuit diagram [10][12]. - Experimental data shows that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining necessary and sufficient conditions for task completion [14]. Group 2: Comparison with MoE Models - The article contrasts Circuit Sparsity with the Mixture of Experts (MoE) model, which uses a gating network to split the model into multiple expert sub-networks, leading to issues such as feature fragmentation and knowledge redundancy [16][18]. - Circuit Sparsity aims for native sparsity, allowing for clearer feature representation and avoiding the interference seen in MoE models [18]. - Despite its advantages, Circuit Sparsity currently faces high computational costs, being 100 to 1000 times more demanding than traditional dense models, which may limit its immediate applicability in the industry [20][21]. Group 3: Future Directions - The team plans to expand Circuit Sparsity technology to larger models to unlock more complex reasoning circuits, indicating ongoing research in AI interpretability [22]. - Two potential methods to overcome the training efficiency challenges of sparse models have been identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [24].