检索增强生成(RAG)

Search documents
告别错误累计与噪声干扰,EviNote-RAG 开启 RAG 新范式
机器之心· 2025-09-12 00:51
本文第一作者戴语琴,清华大学博士生。该工作为戴语琴在蚂蚁大安全实习期间完成,该工作属于蚂蚁集团大安全 Venus 系列工作,致力于打造搜索智能体 / UI 智能体。本文通讯作者为该校副教授吕帅,研究方向包括大语言模型、多模态生成、AI4Design。共同通讯作者沈永亮,浙江大学百人计划研究员,博士生导 师,研究方向包括大模型推理、RAG 检索增强生成、多模态生成模型等。 在检索增强生成(RAG)飞速发展的当下,研究者们面临的最大困境并非「生成」,而是「稳定」。 低信噪比 让关键信息淹没在冗余文档里, 错误累计 则让推理链像骨牌一样层层坍塌。这两大顽疾,使得现有 RAG 系统在复杂任务中难以真正可靠。 近期,一项由蚂蚁集团、清华大学、浙江大学、MIT、UC Berkeley、香港大学和新加坡国立大学等机构联合完成的研究提出了全新方案—— EviNote-RAG 。它 不仅在多个权威基准上实现了显著性能提升,更在训练稳定性与推理可靠性上带来了质的飞跃。 核心秘诀在于两个创新: 这一组合带来的改变是革命性的:训练曲线不再震荡,答案推理更加稳健。消融与补充实验进一步验证了这一点—— SEN 是性能提升的基石,而 EQ ...
Qwen3-Max-Preview 上线,官方称系通义千问系列最强大的语言模型
Sou Hu Cai Jing· 2025-09-06 10:03
Core Insights - Alibaba's Tongyi Qwen has launched the latest Qwen-3-Max-Preview model, which is described as the most powerful language model in the Tongyi Qwen series [1] - The Qwen-3-Max model offers significant improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version [1][3] - The model supports over 100 languages and is optimized for retrieval-augmented generation (RAG) and tool invocation, although it does not include a dedicated "thinking" mode [1][3] Pricing and Performance - The input price for using the Qwen-3-Max model is $1.20 per million tokens, while the output price is $6 per million tokens [2][5] - The model can handle a context of up to 256,000 tokens, with a maximum output of 32,800 tokens [5] Technical Enhancements - Qwen-3-Max provides higher accuracy in mathematical, coding, logic, and scientific tasks, and it reliably follows complex instructions in both Chinese and English [1][3] - The model reduces hallucinations and generates higher-quality responses for open-ended questions, writing, and conversation [1][3]
检索增强生成(RAG)的版权新关注
3 6 Ke· 2025-08-14 10:11
Group 1 - The core viewpoint of the articles is the evolution of generative artificial intelligence (AIGC) from a reliance on model training (AIGC 1.0) to a new phase (AIGC 2.0) that integrates authoritative third-party information to enhance the accuracy, timeliness, and professionalism of generated content [2][3] - Amazon's unexpected partnerships with major media outlets like The New York Times and Hearst mark a significant shift in the industry, especially given The New York Times' previous legal actions against AI companies for copyright infringement [2][3] - OpenAI's collaboration with The Washington Post is part of a broader trend, as OpenAI has partnered with over 20 publishers to provide users with reliable and accurate information [2][3] Group 2 - The rise of "Retrieval-Augmented Generation" (RAG) technology is attributed to its ability to combine pre-trained model knowledge with external knowledge retrieval, addressing issues like "model hallucination" and "temporal gaps" in information [4][5] - RAG allows models to provide accurate answers using real-time external data without needing to retrain model parameters, thus enhancing the relevance of responses [6] - The process of RAG involves two stages: data retrieval and content integration, which raises concerns about copyright issues due to the use of large volumes of copyrighted material [6][8] Group 3 - The first copyright infringement lawsuit related to RAG occurred in October 2024, highlighting the legal challenges faced by AI companies in utilizing copyrighted content [8] - In February 2025, a group of major publishers sued an AI company for allegedly using their content without permission through RAG technology, indicating a growing trend of legal disputes in this area [8] - The European Court of Justice is also involved in a case concerning copyright disputes related to generative AI, reflecting the complexity of these legal issues [9] Group 4 - The collection of works during the data retrieval phase raises questions about copyright infringement, particularly regarding the distinction between temporary and permanent copies of copyrighted material [11] - The legality of using copyrighted works in RAG systems depends on whether the retrieval process constitutes long-term copying, which is generally considered infringing without authorization [11][12] - The handling of copyrighted works in RAG systems must also consider the potential for bypassing technical protections, which could lead to legal violations [12][13] Group 5 - The evaluation of how RAG utilizes works during the content integration phase is crucial for determining potential copyright infringement, including direct and indirect infringement scenarios [14] - Direct infringement may occur if the output content violates copyright laws by reproducing or adapting protected works without permission [14] - Indirect infringement could arise if the AI model facilitates the spread of infringing content, depending on the model's design and the actions taken upon discovering such infringement [15] Group 6 - The concept of "fair use" in copyright law is a significant factor in determining the legality of RAG systems, with different jurisdictions having varying standards for what constitutes fair use [17][18] - The relationship between copyright technical measures and fair use is complex, as circumventing technical protections may impact the assessment of fair use claims [17][18] - The output of RAG systems must be carefully evaluated to ensure that it does not exceed reasonable limits of use, as this could lead to copyright infringement [19]
检索增强生成(RAG)的版权新关注
腾讯研究院· 2025-08-14 08:33
Group 1 - The article discusses the evolution of AIGC (Artificial Intelligence Generated Content) from the 1.0 phase, which relied solely on model training, to the 2.0 phase, characterized by "Retrieval-Augmented Generation" (RAG) that integrates authoritative third-party information to enhance content accuracy and timeliness [6][10] - Major collaborations between AI companies and media organizations, such as Amazon's partnerships with The New York Times and OpenAI's collaboration with The Washington Post, highlight the industry's shift towards providing reliable and factual information [3][6] - RAG combines language generation models with information retrieval techniques, allowing models to access real-time external data without needing to retrain their parameters, thus addressing issues like "model hallucination" and "temporal disconnection" [8][10] Group 2 - The rise of RAG is attributed to the need to overcome inherent flaws in traditional large models, such as generating unreliable information and lacking real-time updates [8][9] - RAG's process involves two stages: data retrieval and content integration, where the model first retrieves relevant information before generating a response [11] - Legal disputes surrounding RAG have emerged, with cases like the lawsuit against Perplexity AI highlighting concerns over copyright infringement due to unauthorized use of protected content [14][16] Group 3 - The article outlines the complexities of copyright issues related to RAG, including the distinction between long-term and temporary copying, which can affect the legality of data retrieval methods [17][18] - Technical protection measures are crucial in determining the legality of content retrieval, as bypassing such measures may violate copyright laws [19][20] - The article emphasizes the need for careful evaluation of how RAG outputs utilize copyrighted works, as both direct and indirect infringements can occur depending on the nature of the content generated [21][23] Group 4 - The concept of "fair use" is explored in the context of RAG, with varying interpretations based on the legality of data sources and the extent of content utilization [25][27] - The relationship between copyright technical measures and fair use is highlighted, indicating that circumventing protective measures can impact the assessment of fair use claims [28] - The article concludes with the ongoing debate regarding the balance between utilizing copyrighted content for AI training and respecting copyright laws, as well as the implications for future AI development [29][30]
万字长文!RAG实战全解析:一年探索之路
自动驾驶之心· 2025-08-07 09:52
Core Viewpoint - The article discusses the Retrieval Augmented Generation (RAG) method, which combines retrieval-based models and generative models to enhance the quality and relevance of generated text. It addresses issues such as hallucination, knowledge timeliness, and long text processing in large models [1]. Group 1: Background and Challenges - RAG was proposed by Meta in 2020 to enable language models to access external information beyond their internal knowledge [1]. - RAG faces three main challenges: retrieval quality, enhancement process, and generation quality [2]. Group 2: Challenges in Retrieval Quality - Semantic ambiguity can arise from vector representations, leading to irrelevant results [5]. - User input has become more complex, transitioning from keywords to natural dialogue, which complicates retrieval [5]. - Document segmentation methods can affect the matching degree between document blocks and user queries [5]. - Extracting and representing multimodal content (e.g., tables, charts) poses significant challenges [5]. - Integrating context from retrieved paragraphs into the current generation task is crucial for coherence [5]. - Redundancy and repetition in retrieved content can lead to duplicated information in generated outputs [5]. - Determining the importance of multiple retrieved paragraphs for the generation task is challenging [5]. - Over-reliance on retrieval content can exacerbate hallucination issues [5]. - Irrelevance of generated answers to the query is a concern [5]. - Toxicity or bias in generated answers is another issue [5]. Group 3: Overall Architecture - The product architecture consists of four layers, including model layer, offline understanding layer, online Q&A layer, and scenario layer [7]. - The RAG framework is divided into three main components: query understanding, retrieval model, and generation model [10]. Group 4: Query Understanding - The query understanding module aims to improve retrieval by interpreting user queries and generating structured queries [14]. - Intent recognition helps select relevant modules based on user queries [15]. - Query rewriting utilizes LLM to rephrase user queries for better retrieval [16]. - Query expansion breaks complex questions into simpler sub-questions for more effective retrieval [22]. Group 5: Retrieval Model - The retrieval model's effectiveness depends on the accuracy of embedding models [33]. - Document loaders facilitate loading document data from various sources [38]. - Text converters prepare documents for retrieval by segmenting them into smaller, semantically meaningful chunks [39]. - Document embedding models create vector representations of text to enable semantic searches [45]. - Vector databases support efficient storage and search of embedded data [47]. Group 6: Generation Model - The generation model utilizes retrieved information to generate coherent responses to user queries [60]. - Different strategies for prompt assembly are employed to enhance response generation [62][63]. Group 7: Attribution Generation - Attribution in RAG is crucial for aligning generated content with reference information, ensuring accuracy [73]. - Dynamic computation methods can enhance the generation process by matching generated text with reference sources [76]. Group 8: Evaluation - The article emphasizes the importance of defining metrics and evaluation methods for assessing RAG system performance [79]. - Various evaluation frameworks, such as RGB and RAGAS, are introduced to benchmark RAG systems [81]. Group 9: Conclusion - The article summarizes key modules in RAG practice and highlights the need for continuous research and development to refine these technologies [82].
忘掉《Her》吧,《记忆碎片》才是 LLM Agent 的必修课
Founder Park· 2025-07-29 08:05
Core Insights - The article discusses the evolution of AI from chatbots to agents, highlighting a significant shift in focus towards task decomposition, tool utilization, and autonomous planning as of 2025 [4][5] - It draws parallels between the character Leonard from the film "Memento" and the concept of AI agents, emphasizing the importance of context engineering in enabling agents to function effectively in complex environments [5][10] Context Engineering - Context engineering is defined as a comprehensive technology stack designed to manage information input and output around the limited attention span of large language models (LLMs) [5][13] - The goal of context engineering is to provide agents with the right information at each decision point, which is crucial for their success [5] Three Pillars of Context Engineering - **External Knowledge Management**: This pillar involves a memory extension module that helps agents overcome short-term memory limitations by providing necessary historical information at decision points [19][20] - **Context Distillation & Structuring**: This pillar focuses on processing and filtering information to extract essential facts, ensuring that agents do not become overwhelmed by excessive data [21][25] - **Hierarchical Memory Management**: This pillar emphasizes the need for a layered memory architecture, allowing agents to maintain focus on their core mission while managing dynamic task-related information [26][30] Challenges in Agent Design - The article identifies two critical vulnerabilities in agent design: context poisoning, where agents may process misleading information, and self-reinforcing cognitive prisons, where agents may rely on their own flawed conclusions [32][34] - It stresses the importance of incorporating a verification and reflection module to mitigate these risks, enabling agents to compare outcomes with expected goals and adjust accordingly [35][36]
梳理了1400篇研究论文,整理了一份全面的上下文工程指南 | Jinqiu Select
锦秋集· 2025-07-21 14:03
Core Insights - The article discusses the emerging field of Context Engineering, emphasizing the need for a systematic theoretical framework to complement practical experiences shared by Manus' team [1][2] - A comprehensive survey titled "A Survey of Context Engineering for Large Language Models" has been published, analyzing over 1400 research papers to establish a complete technical system for Context Engineering [1][2] Context Engineering Components - Context Engineering is built on three interrelated components: Information Retrieval and Generation, Information Processing, and Information Management, forming a complete framework for optimizing context in large models [2] - The first component, Context Retrieval and Generation, focuses on engineering methods to effectively acquire and construct context information for models, including practices like Prompt Engineering, external knowledge retrieval, and dynamic context assembly [2] Prompting Techniques - Prompting serves as the starting point for model interaction, where effective prompts can unlock deeper capabilities of the model [3] - Zero-shot prompting provides direct instructions relying on pre-trained knowledge, while few-shot prompting offers a few examples to guide the model in understanding task requirements [4] Advanced Reasoning Frameworks - For complex tasks, structured thinking is necessary, with Chain-of-Thought (CoT) prompting models to think step-by-step, significantly improving accuracy in complex tasks [5] - Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT) further enhance reasoning by allowing exploration of multiple paths and dependencies, improving success rates in tasks requiring extensive exploration [5] Self-Refinement Mechanisms - Self-Refinement allows models to iteratively improve their outputs through self-feedback without requiring additional supervised training data [8][9] - Techniques like N-CRITICS and Agent-R enable models to evaluate and correct their reasoning paths in real-time, enhancing output quality [10][11] External Knowledge Retrieval - External knowledge retrieval, particularly through Retrieval-Augmented Generation (RAG), addresses the static nature of model knowledge by integrating dynamic information from external databases [12][13] - Advanced RAG architectures introduce adaptive retrieval mechanisms and hierarchical processing strategies to enhance information retrieval efficiency [14][15] Context Processing Challenges - Processing long contexts presents significant computational challenges due to the quadratic complexity of Transformer self-attention mechanisms [28] - Innovations like State Space Models and Linear Attention aim to reduce computational complexity, allowing models to handle longer sequences more efficiently [29][30] Context Management Strategies - Effective context management is crucial for organizing, storing, and utilizing information, addressing issues like context overflow and collapse [46][47] - Memory architectures inspired by operating systems and cognitive models are being developed to enhance the memory capabilities of language models [48][50] Tool-Integrated Reasoning - Tool-Integrated Reasoning transforms language models from passive text generators into active agents capable of interacting with the external world through function calling and integrated reasoning frameworks [91][92]
Multi-Agent 协作兴起,RAG 注定只是过渡方案?
机器之心· 2025-07-19 01:31
Group 1: Core Insights - The AI memory system is evolving from Retrieval-Augmented Generation (RAG) to a multi-level state dynamic evolution, enabling agents to retain experiences and manage memory dynamically [1][2]. - Various AI memory projects have emerged, transitioning from short-term responses to long-term interactions, thereby enhancing agents with "sustained experience" capabilities [2][3]. - MemoryOS introduces a hierarchical storage architecture that categorizes dialogue memory into short-term, medium-term, and long-term layers, facilitating dynamic migration and updates through FIFO and segmented paging mechanisms [2][3]. - MemGPT adopts an operating system approach, treating fixed-length context as "main memory" and utilizing paging to manage large document analysis and multi-turn conversations [2][3]. - Commercial platforms like ChatGPT Memory operate using RAG, retrieving user-relevant information through vector indexing to enhance memory of user preferences and historical data [2][3]. Group 2: Challenges Facing AI Memory - AI memory systems face several challenges, including static storage limitations, chaotic multi-modal and multi-agent collaboration, retrieval expansion conflicts, and weak privacy control [4][5]. - The need for hierarchical and state filtering mechanisms is critical, as well as the ability to manage enterprise-level multi-tasking and permissions effectively [4][5]. - These challenges not only test the flexibility of the technical architecture but also drive the evolution of memory systems towards being more intelligent, secure, and efficient [4][5].
为什么2025成了Agent落地元年?
虎嗅APP· 2025-07-18 10:20
Core Insights - The article discusses the rapid evolution and changing landscape of the large model industry, highlighting a shift from numerous players to a few dominant ones focusing on capital and technology battles [2][29] - The focus has transitioned from model performance to the practical application of large models in business productivity, with "Agent" technology emerging as a key solution [4][8] Group 1: Industry Trends - The "hundred model battle" of 2023 has evolved into a scenario where the market is dominated by a few players, emphasizing the importance of converting large model capabilities into business value [2][29] - The emergence of Agentic AI is driven by advancements in agent orchestration frameworks and standardized protocols, making it easier to build and deploy agents across various industries [10][19] Group 2: Agentic AI Development - AWS's recent summit emphasized Agentic AI as a transformative technology that allows large models to take proactive actions rather than just responding to prompts [8][10] - The article outlines six key challenges that need to be addressed for agents to transition from proof of concept to production, including security, memory management, and tool discovery [12][13] Group 3: Amazon Bedrock AgentCore - AWS introduced Amazon Bedrock AgentCore to lower the barriers for building enterprise-level agents, providing a comprehensive solution that includes runtime environments, memory systems, and identity management [15][19] - The AgentCore framework allows developers to deploy agents without needing extensive knowledge of cloud-native environments, thus facilitating faster and safer deployment [15][19] Group 4: Customization and Advanced Features - For enterprises with specific needs, AWS offers advanced features like S3 Vectors for efficient vector storage and retrieval, and Amazon Nova for model customization [21][25] - The introduction of Kiro, an AI IDE product, aims to enhance coding efficiency by integrating product requirements and documentation into the development process [26]
1万tokens是检验长文本的新基准,超过后18款大模型集体失智
量子位· 2025-07-17 02:43
Core Insights - The article discusses the performance decline of large language models (LLMs) as the input context length increases, highlighting that the decline is not uniform but occurs at specific token lengths [10][21][44] - A recent study by the Chroma team tested 18 mainstream LLMs, revealing that models like GPT-4.1 and Claude Sonnet 4 experience significant accuracy drops when processing longer inputs [8][9][19] Group 1: Performance Decline - As input length increases, model performance deteriorates, with a notable drop around 10,000 tokens, where accuracy can fall to approximately 50% [4][21] - Different models exhibit varying thresholds for performance decline, with some models losing accuracy earlier than others [6][7][19] - The study indicates that semantic similarity between the "needle" (target information) and the "problem" significantly affects performance, with lower similarity leading to greater declines [19][21] Group 2: Experimental Findings - Four controlled experiments were conducted to assess the impact of input length on model performance, focusing on factors like semantic similarity, interference information, and text structure [17][35][41] - The first experiment showed that as input length increased, models struggled more with low semantic similarity, leading to a sharper performance drop [19][21] - The second experiment demonstrated that the presence of interference items significantly reduced model accuracy, with multiple interference items causing a 30%-50% drop compared to baseline performance [26][28] Group 3: Structural Impact - The structure of the background text (haystack) also plays a crucial role in model performance, with coherent structures leading to more significant declines in accuracy compared to disordered structures [40][42] - The experiments revealed that most models performed worse with coherent structures as input length increased, while performance decline was less severe with disordered structures [41][44] - The findings suggest that LLMs face challenges in processing complex logical structures in long texts, indicating a need for improved handling of such inputs [41][44] Group 4: Implications and Future Directions - The results highlight the limitations of current LLMs in managing long-context tasks, prompting suggestions for clearer instructions and context management strategies [44] - Chroma, the team behind the research, aims to address these challenges by developing open-source tools to enhance LLM applications in processing long texts [45][48]