MUSE
Search documents
AI牛马实现“干中学”!上海AI Lab联合推出智能体自我进化新框架
量子位· 2025-10-21 23:50
Core Viewpoint - The article discusses the introduction of the MUSE framework, which aims to enhance the capabilities of LLM agents by enabling them to accumulate experience and evolve continuously, addressing the challenges of long-horizon tasks and memory limitations [1][5]. Group 1: MUSE Framework Overview - MUSE stands for Memory-Utilizing and Self-Evolving, designed to create a closed-loop system for LLM agents that allows them to learn from experience and evolve over time [5]. - The framework consists of a hierarchical memory module that organizes different levels of experience, including strategic, procedural, and tool memory [7][8]. Group 2: Key Mechanisms of MUSE - The first step involves a hierarchical memory module that allows agents to retain and apply historical knowledge, overcoming the "forgetfulness" of traditional LLMs [7]. - The second step is self-reflection, where agents evaluate their task execution and convert raw execution trajectories into structured experiences, refining their standard operating procedures (SOPs) [10][11]. - The third step focuses on self-evolution, enabling agents to continuously improve through a cycle of planning, execution, reflection, and experience extraction [13][15]. Group 3: Experimental Results - MUSE demonstrated state-of-the-art (SOTA) performance in the TAC benchmark, achieving a score of 51.78%, surpassing existing methods that used larger models [16]. - The framework's ability to accumulate experience leads to improved performance over time, showcasing its potential for long-term productivity tasks [19]. Group 4: Future Prospects - The MUSE framework signifies a new phase of experience-driven lifelong learning for AI agents, moving beyond static testing models [29]. - Future research directions include optimizing memory, enriching experience sources, integrating human feedback, and developing comprehensive evaluation standards for long-term tasks [30][31].
同行评审濒临崩溃,一篇审稿报告450美元?科学家不再愿意「用爱发电」
3 6 Ke· 2025-09-01 07:54
智利的超大望远镜上有一台名叫MUSE的设备,能让研究人员探测最遥远的星系。 它非常抢手,以至于在十月至次年四月的观测季中,全球科学家申请的使用总时长超过了3000小时。 问题来了:这相当于379个通宵的工作量,而观测季总共只有七个月。 就算MUSE是台宇宙时光机,时间也完全不够用。 以往,管理这台望远镜的欧洲南方天文台(ESO)会组织专家团,从海量申请中挑选出最有价值的项目。 但随着申请书的爆炸式增长,专家们也渐渐不堪重负。 因此,ESO在2022年想出了一个新办法:把评审工作下放给申请者。 也就是说,任何团队想申请使用望远镜,就必须同时帮忙评审其他竞争对手的申请方案。 这种「申请者互评」的模式,正成为解决同行评审领域劳动力短缺的一个热门方案。 如今,学术论文越来越多,期刊编辑们叫苦不迭,因为想找人帮忙审稿正变得越来越难。 ESO这样的资助机构,也同样在为找不到足够的评审专家而发愁。 这个系统压力山大的后果是什么呢? 研究质量下滑:许多人指出,现在一些期刊上出现了质量低劣、甚至错误百出的研究,这说明同行评审没能把好质量关。 创新想法被埋没:也有人抱怨,现有评审流程过于繁琐死板,导致一些真正激动人心的好点子拿不 ...