机制可解释性 - filings, earnings calls, financial reports, news

机制可解释性

Search documents

EMNLP2025 | 通研院揭秘MoE可解释性，提升Context忠实性！

机器之心· 2025-11-15 06:23

Core Insights - The article discusses the integration of Mechanistic Interpretability with Mixture-of-Experts (MoE) models, highlighting the importance of understanding the underlying mechanisms to enhance model performance and explainability [4][5][6]. Group 1: Mechanistic Interpretability and MoE - There are many teams working on MoE models, but few focus on Mechanistic Interpretability, making this a rare and valuable area of research [4]. - The article proposes a method called "Router Lens & CEFT" aimed at improving context faithfulness in language models, which has been accepted for EMNLP 2025 [7][9]. - The research identifies experts within MoE models that are particularly adept at utilizing contextual information, termed "Context-Faithful Experts" [14][18]. Group 2: Context Faithfulness and Expert Specialization - Context faithfulness refers to the model's ability to generate responses based strictly on the provided context, avoiding irrelevant information [10]. - The study confirms the existence of context-faithful experts within MoE models, demonstrating that adjusting expert activation can significantly enhance context utilization [18][20]. - The Router Lens method is used to identify these experts by calibrating routing behavior to reflect their true capabilities [16]. Group 3: Performance Improvements and Efficiency - The CEFT method, which fine-tunes only the identified context-faithful experts, shows that it can achieve or exceed the performance of full parameter fine-tuning while significantly reducing the number of trainable parameters [41][44]. - The results indicate that CEFT requires training only 500 million parameters compared to 6.9 billion for full fine-tuning, achieving a 13.8 times reduction in parameter count [44]. - CEFT demonstrates superior resistance to catastrophic forgetting compared to full fine-tuning, as evidenced by performance metrics across various benchmarks [46]. Group 4: Future Applications and Research Directions - The Router Lens method can be applied to identify and analyze other types of experts, such as those specialized in reasoning or programming [50]. - It can also help in debugging MoE models by locating poorly performing or misleading experts [51]. - Combining Router Lens with other interpretability techniques could further enhance understanding of expert behavior and knowledge distribution within models [51].

Sutton判定「LLM是死胡同」后，新访谈揭示AI困境

机器之心· 2025-10-15 07:33

Core Viewpoint - The article discusses Rich Sutton's critical perspective on large language models (LLMs), suggesting they may not align with the principles outlined in his work "The Bitter Lesson" and highlighting their limitations in learning from real-world interactions [1][3][22]. Group 1: Limitations of LLMs - Sutton argues that LLMs have significant flaws, particularly their inability to learn from ongoing interactions with the environment [3][21]. - He emphasizes that true intelligence should emerge from continuous reinforcement learning through dynamic interactions, rather than relying on extensive pre-training and supervised fine-tuning [3][4][22]. - The reliance on human knowledge and data in LLMs may lead to a lack of scalability and potential failure to meet expectations, as they are fundamentally limited by the biases present in the training data [24][25][26]. Group 2: Alternative Perspectives on Intelligence - Experts in the discussion, including Suzanne Gildert and Niamh Gavin, express skepticism about achieving pure reinforcement learning, suggesting that current systems often revert to imitation learning due to the difficulty in defining universal reward functions [7][11]. - The conversation highlights the need for systems that can autonomously learn in new environments, akin to how a squirrel learns to hide nuts, rather than relying solely on pre-existing data [8][10]. - There is a consensus that while LLMs exhibit impressive capabilities, they do not equate to true intelligence, as they lack the ability to explore and learn from their environment effectively [33][35]. Group 3: The Future of AI Development - The article suggests that the AI field is at a crossroads, where the dominance of certain paradigms may hinder innovation and lead to a cycle of self-limitation [28][29]. - Sutton warns that the current trajectory of LLMs, heavily reliant on human imitation, may not yield the breakthroughs needed for genuine understanding and reasoning capabilities [22][24]. - The discussion indicates a shift towards exploring more robust learning mechanisms that prioritize experience and exploration over mere data absorption [28][30].

2025年《麻省理工科技评论》“35岁以下科技创新35人”发布！

机器人圈· 2025-09-12 10:05

Core Viewpoint - The article highlights the achievements of 35 innovators under the age of 35 in various fields such as climate and energy, artificial intelligence, biotechnology, computing, and materials science, showcasing their groundbreaking contributions and potential impact on their respective industries [6][11][60]. Climate and Energy - Innovators in this sector are developing advanced technologies for decarbonization, with applications across shipping, fashion, and other industries. They are also exploring new methods for sustainable energy and innovative uses for carbon capture [11]. - Iwnetim Abate is working on producing ammonia using underground heat and pressure, aiming to reduce carbon emissions associated with traditional ammonia production, which contributes 1% to 2% of global CO2 emissions [13]. - Sarah Lamaison's company, Dioxycle, is developing a method to produce chemicals using electricity instead of fossil fuels, significantly reducing greenhouse gas emissions [16][17]. - Gaël Gobaille-Shaw's Mission Zero focuses on direct air capture technology to extract CO2 from the atmosphere, while his second company, Supercritical, aims to produce hydrogen efficiently [19][20]. Artificial Intelligence - Aditya Grover has developed ClimaX, an AI model that predicts weather and climate events, utilizing extensive datasets for improved accuracy [22][23]. - Neel Nanda is researching the interpretability of AI models to ensure their safe and beneficial development, focusing on understanding the decision-making processes of these models [34][35]. - Mark Chen has led advancements in AI models for image processing and code generation, contributing to the development of OpenAI's DALL·E and Codex [38][39]. - Akari Asai is working on retrieval-augmented generation technology to reduce AI hallucinations by allowing models to reference stored data before generating responses [51][52]. Biotechnology - Christian Kramme's company, Gameto, is developing artificial ovarian technology to assist IVF patients, aiming to reduce hormonal injections and stress during the process [62][63]. - Kevin Eisenfrats founded Contraline to create a long-lasting male contraceptive gel, with ongoing clinical trials to validate its effectiveness [64][65]. Computing and Materials Science - Pierre Forin's company, Calcarea, is developing a system to capture and store CO2 emissions from ships, with plans for commercial deployment by 2027 or 2028 [28][29]. - Neeka Mashouf's Rubi Laboratories is innovating a method to produce textiles by extracting CO2 directly from the atmosphere, aiming for sustainable fashion solutions [25][26].

大模型到底是怎么「思考」的？第一篇系统性综述SAE的文章来了

机器之心· 2025-06-22 05:57

Core Viewpoint - The article emphasizes the need for not just "talkative" large language models (LLMs) but also "explainable" ones, highlighting the emergence of Sparse Autoencoder (SAE) as a leading method for mechanistic interpretability in understanding LLMs [2][10]. Group 1: Introduction to Sparse Autoencoder (SAE) - SAE is a technique that helps interpret the internal mechanisms of LLMs by decomposing high-dimensional representations into sparse, semantically meaningful features [7][10]. - The activation of specific features by SAE allows for insights into the model's "thought process," enabling a better understanding of how LLMs process information [8][10]. Group 2: Technical Framework of SAEs - The technical framework of SAE includes an encoder that decomposes LLM's high-dimensional vectors into sparse feature vectors, and a decoder that attempts to reconstruct the original LLM information [14]. - Various architectural variants and improvement strategies of SAE are discussed, such as Gated SAE and TopK SAE, which address specific challenges like shrinkage bias [15]. Group 3: Explainability Analysis of SAEs - SAE facilitates concept discovery by automatically mining semantically meaningful features from the model, enabling better understanding of aspects like temporal awareness and emotional inclination [16]. - It allows for model steering by activating or suppressing specific features to guide model outputs, and aids in anomaly detection to identify potential biases or safety risks [16]. Group 4: Evaluation Metrics and Methods - Evaluation of SAE involves both structural assessment (e.g., reconstruction accuracy and sparsity) and functional assessment (e.g., understanding LLM and feature stability) [18]. Group 5: Applications in Large Language Models - SAE is applied in various practical scenarios, including model manipulation, behavior analysis, hallucination control, and emotional steering, showcasing its versatility [19]. Group 6: Comparison with Probing Methods - The article compares SAE with traditional probing methods, highlighting SAE's unique potential in model manipulation and feature extraction, while acknowledging its limitations in complex scenarios [20]. Group 7: Current Research Challenges and Future Directions - Despite its promise, SAE faces challenges such as unstable semantic explanations and high training costs, with future breakthroughs anticipated in cross-modal expansion and automated explanation generation [21]. Conclusion - The article concludes that future explainable AI systems should not only visualize model behavior but also provide structured understanding and operational capabilities, with SAE offering a promising pathway [23].

大语言模型

Sparse Autoencoder（SAE）

机制可解释性

大语言模型

Sparse Autoencoder（SAE）

机制可解释性