Workflow
MIRIX
icon
Search documents
那天,AI大模型想起了,被「失忆」所束缚的枷锁
机器之心· 2025-08-31 05:33
Core Insights - The article discusses the advancements in memory capabilities of large language models (LLMs), highlighting how companies like Google, OpenAI, and Anthropic are integrating memory features into their AI systems to enhance user interaction and continuity in conversations [1][3][10]. Memory Capabilities of LLMs - Google's Gemini has introduced memory capabilities that allow it to retain information across multiple conversations, making interactions more natural and coherent [1]. - OpenAI's ChatGPT has implemented a memory feature since February 2024, enabling users to instruct the model to remember specific details, which improves its performance over time [3][42]. - Anthropic's Claude has also added memory functionality, allowing it to recall previous discussions when prompted by the user [3][6]. Types of Memory in LLMs - Memory can be categorized into sensory memory, short-term memory, and long-term memory, with a focus on long-term memory for LLMs [16][17]. - Contextual memory is a form of short-term memory where relevant information is included in the model's context window [18]. - External memory involves storing information in an external database, allowing for retrieval during interactions, which is a common method for building long-term memory [22][23]. - Parameterized memory attempts to encode information directly into the model's parameters, providing a deeper form of memory [24][29]. Innovations in Memory Systems - New startups are emerging, focusing on memory systems for AI, such as Letta AI's MemGPT and RockAI's Yan 2.0 Preview, which aim to enhance memory capabilities [11][12]. - The concept of hybrid memory systems is gaining traction, combining different types of memory to improve AI's adaptability and performance [37][38]. Notable Memory Implementations - OpenAI's ChatGPT allows users to manage their memory entries, while Anthropic's Claude retrieves past conversations only when requested [42][44]. - Gemini supports user input for memory management, enhancing its ability to remember user preferences [45]. - The M3-Agent developed by ByteDance, Zhejiang University, and Shanghai Jiao Tong University integrates long-term memory capabilities across multiple modalities, including video and audio [10][70]. Future Trends in AI Memory - The future of AI memory is expected to evolve towards multi-modal and integrated memory systems, allowing for a more comprehensive understanding of user interactions [97][106]. - There is a growing emphasis on creating memory systems that can autonomously manage and optimize their memory, akin to human cognitive processes [101][106]. - The ultimate goal is to develop AI systems that can exhibit unique personalities and emotional connections through their memory capabilities, potentially leading to the emergence of artificial general intelligence (AGI) [109][110].
全球首次,「AI记忆」开源落地,MIRIX同步上线APP
3 6 Ke· 2025-07-30 03:32
加利福尼亚大学圣迭戈分校博士生王禹和纽约大学教授陈溪联合推出并开源了 MIRIX,全球首个真正意义上的多模态、多智能体AI记忆系 统。MIRIX团队同步上线了一款桌面端APP,可直接下载使用! 还记得第一次用 GPT 写邮件的惊喜吗?却也一定遇到过今天的 AI「忘性」——聊得再深入,窗口一关,历史烟消云散。 因此,研究人员认为:从「对话」到「记忆」,将是AI进化的必经之路。 研究人员推出并开源MIRIX,全球首个真正意义上的多模态、多智能体AI记忆系统。 在ScreenshotVQA这一需要深度多模态理解的挑战性基准上,MIRIX的准确率比传统RAG方法高出35%,存储开销降低99.9%,与长文本方法相比超出 410%,开销降低93.3%。 在LOCOMO长对话任务中,MIRIX以85.4%的成绩显著超越所有现有方法,树立了新的性能标杆。 与此同时,研究人员在Mac端上线了一款应用产品,通过这款开箱即用的应用程序,终于可以为每个人构建专属于自己的AI个人助理。 桌面端APP使用场景 直接访问官方网站,即可直接下载APP: 论文链接:https://arxiv.org/abs/2507.07957 官方网站:h ...
腾讯研究院AI速递 20250730
腾讯研究院· 2025-07-29 16:01
Group 1 - Anthropic announced a weekly usage limit for Claude Pro and Max users, affecting less than 5% of subscribers [1] - Some users reported extreme cases where a $200 plan resulted in actual consumption of tens of thousands of dollars due to continuous operation [1] - Users expressed a lack of transparency regarding usage, leading many to seek alternative products [1] Group 2 - Microsoft Edge introduced a "Copilot mode" that enhances context awareness across tabs, allowing simultaneous reading and analysis of all open pages [2] - The new interface features a simplified input box that understands user intent and supports voice control and thematic journey functions [2] - This feature is currently available for free in all Copilot markets but may be bundled with a subscription service in the future [2] Group 3 - Wuwen Chipong launched a comprehensive AI efficiency enhancement solution, including three core products: Wuqiong AI Cloud, Wujie Intelligent Computing Platform, and Wuyin Terminal Intelligence [3] - The solution covers 26 provinces and cities with 53 core data centers, integrating over 15 mainstream chip architectures and achieving a total computing power scale exceeding 25,000 P [3] - Innovations on the edge include the world's first edge intrinsic model "Wuqiong Tianquan," which maintains cloud-level intelligence with 21 billion parameters while controlling memory usage to 7 billion [3] Group 4 - Step 3 launched a new AI research assistant called "Jieyue Deep Research," capable of completing complex research tasks and generating in-depth professional reports within ten minutes [4][5] - The assistant achieved a 70% high pass rate in the xbench-DeepSearch evaluation [5] - It is based on reinforcement learning and multi-agent architecture, enabling autonomous thinking, reasoning, and dynamic tool usage for real-world complex tasks [5] Group 5 - JD.com upgraded its large model brand to JoyAI, introducing solutions like JoyAgent intelligent agent platform, JoyInside embedded intelligence, and digital humans [6] - JoyAgent is the first 100% open-source enterprise-level intelligent agent, receiving over 2,000 GitHub stars and possessing a complete product-level closed-loop capability [6] - JoyAI's products have been implemented in various scenarios, with digital human services exceeding 20,000 brands and the interactive AI toy Fuzozo selling out during its first pre-sale [6] Group 6 - Researchers from UC San Diego and NYU launched and open-sourced MIRIX, the world's first multi-modal, multi-agent AI memory system, along with a desktop app [7] - The system categorizes memory into six modules: core, context, semantics, programs, resources, and knowledge repository, managed by a meta-memory manager and six memory sub-modules [7] - MIRIX achieved a 35% higher accuracy than traditional RAG in the ScreenshotVQA test and reduced storage by 99.9%, setting a record of 85.4% in the LOCOMO long dialogue task [7] Group 7 - The National Satellite Meteorological Center, Nanchang University, and Huawei jointly released the "Fengyu" model, the world's first full-chain space weather AI forecasting model [8] - The model features a pioneering chain training structure, including solar wind, Earth's magnetic field, and ionosphere models [8] - In practical tests, "Fengyu" maintained a prediction error of around 10% for global electron density and performed excellently during multiple major magnetic storm events, with 11 national invention patents applied [8] Group 8 - Shanghai AI Lab released and open-sourced the "Shusheng" scientific multi-modal large model Intern-S1, which surpasses top closed-source models in scientific capabilities [9] - The model features a "cross-modal scientific analysis engine" that can accurately interpret complex scientific data such as chemical formulas and protein structures [9] - The research team proposed a method for synthesizing scientific data that combines general reasoning capabilities with multiple top professional abilities, creatively reducing reinforcement learning training costs [9] Group 9 - a16z partner Martin Casado stated that the AI large model competition will evolve into an oligopoly similar to the cloud computing battle, creating a new brand effect [10] - In AI competition, the application layer lacks a technological moat, and rational business decisions will focus on "sacrificing profits for distribution," with value emerging from foundational infrastructure and vertical domain deepening [10] - AI will not transform ordinary developers into super engineers but will allow "10x engineers to become 2x," simplifying programming by eliminating cumbersome tasks and returning to the essence of creation [10] Group 10 - Tencent's Robotics X Lab and Futian Lab jointly launched the embodied intelligence open platform Tairos, aimed at enhancing software capabilities for robot developers and application developers [11] - The platform is based on the SLAP³ technology system, providing three core capabilities: planning large models, multi-modal perception large models, and perception-action joint large models [11] - Five major trends in the future development of embodied intelligence were identified: integration of virtual and real worlds, reduced technical barriers, intelligent evolution, agentification, and multi-modal perception [11]
腾讯研究院AI速递 20250716
腾讯研究院· 2025-07-15 15:09
Group 1 - The U.S. government has granted Nvidia permission to resume sales of the H20 AI chip to China, following a meeting between Jensen Huang and President Trump [1] - Nvidia reported a record revenue of $26.044 billion for Q1 FY2025, a 262% year-over-year increase, with data center revenue of $22.6 billion being the main growth driver [1] Group 2 - Meta is building the "Prometheus" AI supercomputer cluster, expected to reach 1GW of computing power by 2026, comparable to the power consumption of a nuclear power plant or a city of one million residents [2] - The "Hyperion" plan in 2027 aims to deploy over 5GW of computing power, with Meta planning to build a natural gas power plant to ensure supply [2] Group 3 - Elon Musk launched the Grok 4 "smart companion" feature, which includes animated characters with interactive voice capabilities, although the functionality is still in early stages [3] - Grok 4 can generate playable HTML5 games and integrate 3D models and textures, showcasing Musk's ambitions in the AI companion and gaming sectors [3] Group 4 - Amazon introduced a new IDE tool called Kiro, which offers "ambient coding" and "planning" modes, enabling specification-driven development through specs and hooks [4][5] - Kiro can convert simple requirements into complete specifications, generating technical design diagrams and automating tasks [5] Group 5 - Google's first Gemini embedding model scored 68.37 in the MTEB evaluation, surpassing OpenAI's score of 58.93, making it the strongest embedding model currently available [6] - The new model is cost-effective, priced at $0.15 per million tokens, and has an open API for independent creators [6] Group 6 - The launch of DeepResearch by BitAI features a visual problem chain to display the AI's thought process, providing detailed research reports and interactive web pages [7] - Free users have a daily limit of 100 searches, while annual members can search up to 500 times per day, making it a cost-effective option compared to other AI services [7] Group 7 - The MIRIX multi-modal AI memory system, developed by UCSD and NYU, achieved a 35% higher accuracy than traditional RAG methods while reducing storage by 99.9% [8] - MIRIX is designed with six types of human memory systems and supports multi-modal input, allowing local memory storage in SQLite databases for privacy protection [8] Group 8 - Microsoft's AI4S team developed the Orbformer model to balance precision and efficiency in quantum chemistry calculations, achieving chemical accuracy while significantly reducing computational costs [10] - The model consists of three main modules and has shown improved performance in various chemical tests [10] Group 9 - An article from The New Yorker discusses the potential of AI companions to alleviate loneliness but warns that complete reliance on them may hinder personal growth and the development of real relationships [11] - The article suggests that AI should be accessible to those in genuine need, such as the elderly or cognitively impaired, while cautioning against over-reliance for the general population [11] Group 10 - An OpenAI engineer argues that coding represents only 10-20% of a programmer's core value, with structured communication accounting for 80-90% [12] - The engineer emphasizes the importance of specifications over code, as specifications capture intent and values more comprehensively [12]