Workflow
机制可解释性
icon
Search documents
2025年《麻省理工科技评论》“35岁以下科技创新35人”发布!
机器人圈· 2025-09-12 10:05
以下文章来源于DeepTech深科技 ,作者麻省理工科技评论 DeepTech深科技 . DeepTech 是一家专注新兴科技的资源赋能与服务机构,致力于推动科学与技术的创新进程。DeepTech 同时是《麻 省理工科技评论》中国区独家运营方。 每年,《麻省理工科技评论》都会评选出35位35岁以下的青年研究者、科学家和企业家,他们因其在各自领域的 开创性工作而入选"35岁以下科技创新35人"名 单。 35位来自 气候与能源、 人工智能、生物技术、计算和材料科学领域的 青年研究者、科学家和企业家凭借在领域 内的卓越成就或创新贡献,从 全球数百位候选人中脱颖而出。 最年轻的入选者是22岁的Victoria de León。汇总 整理了入选名单,具体如下: 2025年"35岁以下科技创新35人"全球入选者 | | | | | lwnetim Abate Sarah Lamaison | 麻省理工学院 Dioxycle | 32岩 31ਵੇ | | --- | --- | --- | --- | | | Gaël Gobaille-Shaw | Mission Zero Technologies Ltd. Super ...
大模型到底是怎么「思考」的?第一篇系统性综述SAE的文章来了
机器之心· 2025-06-22 05:57
Core Viewpoint - The article emphasizes the need for not just "talkative" large language models (LLMs) but also "explainable" ones, highlighting the emergence of Sparse Autoencoder (SAE) as a leading method for mechanistic interpretability in understanding LLMs [2][10]. Group 1: Introduction to Sparse Autoencoder (SAE) - SAE is a technique that helps interpret the internal mechanisms of LLMs by decomposing high-dimensional representations into sparse, semantically meaningful features [7][10]. - The activation of specific features by SAE allows for insights into the model's "thought process," enabling a better understanding of how LLMs process information [8][10]. Group 2: Technical Framework of SAEs - The technical framework of SAE includes an encoder that decomposes LLM's high-dimensional vectors into sparse feature vectors, and a decoder that attempts to reconstruct the original LLM information [14]. - Various architectural variants and improvement strategies of SAE are discussed, such as Gated SAE and TopK SAE, which address specific challenges like shrinkage bias [15]. Group 3: Explainability Analysis of SAEs - SAE facilitates concept discovery by automatically mining semantically meaningful features from the model, enabling better understanding of aspects like temporal awareness and emotional inclination [16]. - It allows for model steering by activating or suppressing specific features to guide model outputs, and aids in anomaly detection to identify potential biases or safety risks [16]. Group 4: Evaluation Metrics and Methods - Evaluation of SAE involves both structural assessment (e.g., reconstruction accuracy and sparsity) and functional assessment (e.g., understanding LLM and feature stability) [18]. Group 5: Applications in Large Language Models - SAE is applied in various practical scenarios, including model manipulation, behavior analysis, hallucination control, and emotional steering, showcasing its versatility [19]. Group 6: Comparison with Probing Methods - The article compares SAE with traditional probing methods, highlighting SAE's unique potential in model manipulation and feature extraction, while acknowledging its limitations in complex scenarios [20]. Group 7: Current Research Challenges and Future Directions - Despite its promise, SAE faces challenges such as unstable semantic explanations and high training costs, with future breakthroughs anticipated in cross-modal expansion and automated explanation generation [21]. Conclusion - The article concludes that future explainable AI systems should not only visualize model behavior but also provide structured understanding and operational capabilities, with SAE offering a promising pathway [23].