检索增强生成(RAG)
Search documents
构建LLM:每个AI项目都需要的知识图谱基础
3 6 Ke· 2025-11-13 00:49
"施瓦茨先生,我已经审阅了你的反对意见书,"联邦法官凯文·卡斯特尔开口道,语气沉稳却不失重点。"你引用了六个案例 来支持你委托人的立场。我想讨论一下 瓦格斯诉中国南方航空公司一案 。" 拥有数十年经验的律师史蒂文·施瓦茨在椅子上挺直了身子。"是的,法官阁下。这是2019年第十一巡回法院的一项判决,它 直接支持——" "我找不到,"法官打断道,"你提供的引证号——925 F.3d 1339——在我书记员查阅过的任何数据库中都没有出现。你能向法 庭提供一份完整的判决书副本吗?" 施瓦茨感到一丝担忧。"当然,法官大人。我会立即提交。"回到办公室后,施瓦茨再次联系他的信息来源。他在ChatGPT上 输入:"Varghese诉中国南方航空公司案,925 F.3d 1339(第十一巡回上诉法院,2019年)是真实存在的案例吗?"对方自信 地回复道:"是的,Varghese诉中国南方航空公司案,925 F.3d 1339是真实存在的案例。您可以在LexisNexis和Westlaw等权威 法律数据库中找到它。" 施瓦茨放心后,向 ChatGPT 询问了更多案件细节。人工智能很配合地生成了一些看似是判决书摘录的内容,包括令人 ...
东方材料日本子公司发布天財Model-v1.1,千亿参数财税大模型实现“认知式AI”突破
Quan Jing Wang· 2025-10-31 02:29
Core Insights - The launch of Tenzai Model-v1.1 by Dongfang Materials' Japanese subsidiary marks a significant advancement in the application of AI within the finance and taxation sector, transitioning from "execution automation" to "cognitive intelligence" [1][4] Technology Foundation - Tenzai Model-v1.1 is built on a trillion-parameter architecture, utilizing a Transformer model optimized for finance and taxation scenarios, incorporating over 5 million real tax documents, 1 million high-quality Q&A pairs, a 50-year database of Japanese tax laws, and over 100,000 real business cases [1][2] - The model employs Domain-adaptive Continued Pre-training and Multi-task Fine-tuning to achieve near-human cognitive abilities in semantic understanding, logical reasoning, and judgment suggestions [2] Innovative Architecture - The system integrates Retrieval-Augmented Generation (RAG) technology to address potential inaccuracies in professional content, ensuring that every recommendation is backed by legal references and case studies [2] - Tenzai Model-v1.1 features multi-modal understanding, capable of processing images, text, and tabular data, achieving a recognition accuracy of 99.8% for complex documents [2] System Performance - The model supports a context length of up to 32K tokens, with an average response time of under 2 seconds, processing 1,200 documents per hour, significantly outperforming current market solutions [2] - It includes a continuous learning mechanism for monthly updates on tax laws and supports private deployment and flexible SaaS architecture [2] Application Depth - Tenzai Model-v1.1 represents a leap from traditional automation systems, enabling semantic understanding, contextual reasoning, and proactive risk alerts in tax-related queries [2][3] - The system has been integrated with major Japanese accounting software, supporting cloud, private, and hybrid deployments, with plans for a mobile app and international versions by 2026 [3] Industry Impact - The release of Tenzai Model-v1.1 signifies a maturation of vertical large models in professional services, transforming unstructured tax knowledge into computable, inferable, and interactive AI capabilities [4]
中国科学院碳足迹智能核算研究取得进展
Huan Qiu Wang Zi Xun· 2025-10-22 02:51
Core Insights - The article discusses the introduction of Chat-LCA, an intelligent life cycle assessment (LCA) solution that integrates large language models (LLM) to enhance carbon accounting efficiency and accuracy in the context of China's "dual carbon" strategy [1][3]. Group 1: Technology and Innovation - Chat-LCA represents a significant advancement by integrating cutting-edge AI technologies such as retrieval-augmented generation (RAG), Text2SQL, chain of thought (CoT), and code chain (CoC) into the entire LCA process [3]. - The system automates the entire workflow from knowledge acquisition to report generation, effectively breaking down knowledge barriers and data silos [3][4]. Group 2: Performance Metrics - Chat-LCA has demonstrated high accuracy and efficiency, achieving a BERTScore of 0.85 in answering professional questions across ten industries, a Text2SQL execution accuracy of 0.9692 on real LCI databases, and a report generation accuracy of 0.9832 with a readability score of 8.42 out of 10 [4]. - The system can reduce traditional LCA analysis time from weeks to just a few hours, marking a qualitative leap in carbon accounting efficiency [4]. Group 3: Practical Applications - In practical applications, such as assessing the carbon footprint of lithium-sulfur batteries, Chat-LCA identified raw material acquisition (47.2%) and production stages (31.3%) as major carbon emission hotspots, providing targeted emission reduction suggestions like clean energy alternatives [4]. - The solution significantly lowers the technical barriers for carbon accounting and expands the applicability of LCA methods across various industrial and policy scenarios, supporting the realization of "dual carbon" goals with actionable technological and decision-making tools [4].
告别错误累计与噪声干扰,EviNote-RAG 开启 RAG 新范式
机器之心· 2025-09-12 00:51
Core Insights - The article discusses the development of EviNote-RAG, a new framework aimed at enhancing retrieval-augmented generation (RAG) models, addressing issues of low signal-to-noise ratio and error accumulation in complex tasks [4][10][11]. Group 1: EviNote-RAG Framework - EviNote-RAG introduces a three-stage process: retrieval, note-taking, and answering, which contrasts with traditional RAG methods that directly rely on retrieval results [14][22]. - The framework utilizes Supportive-Evidence Notes (SEN) to filter out noise and highlight key information, mimicking human note-taking habits [20][22]. - Evidence Quality Reward (EQR) is incorporated to ensure that the notes genuinely support the final answer, thus reducing shallow matching and error accumulation [20][22]. Group 2: Performance Improvements - EviNote-RAG has shown significant performance improvements across various open-domain question-answering benchmarks, achieving a 20% increase in F1 score on HotpotQA, a 40% increase on Bamboogle, and a 91% increase on 2Wiki [25][24]. - The framework has demonstrated enhanced generalization capabilities and training stability, making it one of the most reliable RAG frameworks available [6][18]. Group 3: Training Dynamics - The introduction of SEN and EQR has transformed the training dynamics from unstable to robust, allowing for a smoother training curve and improved performance [27][28]. - Key findings indicate that structured instructions lead to stability, while noise filtering through SEN significantly enhances computational efficiency [28][29]. Group 4: Experimental Validation - Ablation studies confirm that both SEN and EQR are crucial for robust reasoning, with SEN providing structured constraints and EQR offering logical consistency supervision [41][45]. - The experiments highlight that effective supervision is more about how supportive evidence is organized and marked rather than merely enforcing summaries [42][45].
Qwen3-Max-Preview 上线,官方称系通义千问系列最强大的语言模型
Sou Hu Cai Jing· 2025-09-06 10:03
Core Insights - Alibaba's Tongyi Qwen has launched the latest Qwen-3-Max-Preview model, which is described as the most powerful language model in the Tongyi Qwen series [1] - The Qwen-3-Max model offers significant improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version [1][3] - The model supports over 100 languages and is optimized for retrieval-augmented generation (RAG) and tool invocation, although it does not include a dedicated "thinking" mode [1][3] Pricing and Performance - The input price for using the Qwen-3-Max model is $1.20 per million tokens, while the output price is $6 per million tokens [2][5] - The model can handle a context of up to 256,000 tokens, with a maximum output of 32,800 tokens [5] Technical Enhancements - Qwen-3-Max provides higher accuracy in mathematical, coding, logic, and scientific tasks, and it reliably follows complex instructions in both Chinese and English [1][3] - The model reduces hallucinations and generates higher-quality responses for open-ended questions, writing, and conversation [1][3]
检索增强生成(RAG)的版权新关注
3 6 Ke· 2025-08-14 10:11
Group 1 - The core viewpoint of the articles is the evolution of generative artificial intelligence (AIGC) from a reliance on model training (AIGC 1.0) to a new phase (AIGC 2.0) that integrates authoritative third-party information to enhance the accuracy, timeliness, and professionalism of generated content [2][3] - Amazon's unexpected partnerships with major media outlets like The New York Times and Hearst mark a significant shift in the industry, especially given The New York Times' previous legal actions against AI companies for copyright infringement [2][3] - OpenAI's collaboration with The Washington Post is part of a broader trend, as OpenAI has partnered with over 20 publishers to provide users with reliable and accurate information [2][3] Group 2 - The rise of "Retrieval-Augmented Generation" (RAG) technology is attributed to its ability to combine pre-trained model knowledge with external knowledge retrieval, addressing issues like "model hallucination" and "temporal gaps" in information [4][5] - RAG allows models to provide accurate answers using real-time external data without needing to retrain model parameters, thus enhancing the relevance of responses [6] - The process of RAG involves two stages: data retrieval and content integration, which raises concerns about copyright issues due to the use of large volumes of copyrighted material [6][8] Group 3 - The first copyright infringement lawsuit related to RAG occurred in October 2024, highlighting the legal challenges faced by AI companies in utilizing copyrighted content [8] - In February 2025, a group of major publishers sued an AI company for allegedly using their content without permission through RAG technology, indicating a growing trend of legal disputes in this area [8] - The European Court of Justice is also involved in a case concerning copyright disputes related to generative AI, reflecting the complexity of these legal issues [9] Group 4 - The collection of works during the data retrieval phase raises questions about copyright infringement, particularly regarding the distinction between temporary and permanent copies of copyrighted material [11] - The legality of using copyrighted works in RAG systems depends on whether the retrieval process constitutes long-term copying, which is generally considered infringing without authorization [11][12] - The handling of copyrighted works in RAG systems must also consider the potential for bypassing technical protections, which could lead to legal violations [12][13] Group 5 - The evaluation of how RAG utilizes works during the content integration phase is crucial for determining potential copyright infringement, including direct and indirect infringement scenarios [14] - Direct infringement may occur if the output content violates copyright laws by reproducing or adapting protected works without permission [14] - Indirect infringement could arise if the AI model facilitates the spread of infringing content, depending on the model's design and the actions taken upon discovering such infringement [15] Group 6 - The concept of "fair use" in copyright law is a significant factor in determining the legality of RAG systems, with different jurisdictions having varying standards for what constitutes fair use [17][18] - The relationship between copyright technical measures and fair use is complex, as circumventing technical protections may impact the assessment of fair use claims [17][18] - The output of RAG systems must be carefully evaluated to ensure that it does not exceed reasonable limits of use, as this could lead to copyright infringement [19]
检索增强生成(RAG)的版权新关注
腾讯研究院· 2025-08-14 08:33
Group 1 - The article discusses the evolution of AIGC (Artificial Intelligence Generated Content) from the 1.0 phase, which relied solely on model training, to the 2.0 phase, characterized by "Retrieval-Augmented Generation" (RAG) that integrates authoritative third-party information to enhance content accuracy and timeliness [6][10] - Major collaborations between AI companies and media organizations, such as Amazon's partnerships with The New York Times and OpenAI's collaboration with The Washington Post, highlight the industry's shift towards providing reliable and factual information [3][6] - RAG combines language generation models with information retrieval techniques, allowing models to access real-time external data without needing to retrain their parameters, thus addressing issues like "model hallucination" and "temporal disconnection" [8][10] Group 2 - The rise of RAG is attributed to the need to overcome inherent flaws in traditional large models, such as generating unreliable information and lacking real-time updates [8][9] - RAG's process involves two stages: data retrieval and content integration, where the model first retrieves relevant information before generating a response [11] - Legal disputes surrounding RAG have emerged, with cases like the lawsuit against Perplexity AI highlighting concerns over copyright infringement due to unauthorized use of protected content [14][16] Group 3 - The article outlines the complexities of copyright issues related to RAG, including the distinction between long-term and temporary copying, which can affect the legality of data retrieval methods [17][18] - Technical protection measures are crucial in determining the legality of content retrieval, as bypassing such measures may violate copyright laws [19][20] - The article emphasizes the need for careful evaluation of how RAG outputs utilize copyrighted works, as both direct and indirect infringements can occur depending on the nature of the content generated [21][23] Group 4 - The concept of "fair use" is explored in the context of RAG, with varying interpretations based on the legality of data sources and the extent of content utilization [25][27] - The relationship between copyright technical measures and fair use is highlighted, indicating that circumventing protective measures can impact the assessment of fair use claims [28] - The article concludes with the ongoing debate regarding the balance between utilizing copyrighted content for AI training and respecting copyright laws, as well as the implications for future AI development [29][30]
万字长文!RAG实战全解析:一年探索之路
自动驾驶之心· 2025-08-07 09:52
Core Viewpoint - The article discusses the Retrieval Augmented Generation (RAG) method, which combines retrieval-based models and generative models to enhance the quality and relevance of generated text. It addresses issues such as hallucination, knowledge timeliness, and long text processing in large models [1]. Group 1: Background and Challenges - RAG was proposed by Meta in 2020 to enable language models to access external information beyond their internal knowledge [1]. - RAG faces three main challenges: retrieval quality, enhancement process, and generation quality [2]. Group 2: Challenges in Retrieval Quality - Semantic ambiguity can arise from vector representations, leading to irrelevant results [5]. - User input has become more complex, transitioning from keywords to natural dialogue, which complicates retrieval [5]. - Document segmentation methods can affect the matching degree between document blocks and user queries [5]. - Extracting and representing multimodal content (e.g., tables, charts) poses significant challenges [5]. - Integrating context from retrieved paragraphs into the current generation task is crucial for coherence [5]. - Redundancy and repetition in retrieved content can lead to duplicated information in generated outputs [5]. - Determining the importance of multiple retrieved paragraphs for the generation task is challenging [5]. - Over-reliance on retrieval content can exacerbate hallucination issues [5]. - Irrelevance of generated answers to the query is a concern [5]. - Toxicity or bias in generated answers is another issue [5]. Group 3: Overall Architecture - The product architecture consists of four layers, including model layer, offline understanding layer, online Q&A layer, and scenario layer [7]. - The RAG framework is divided into three main components: query understanding, retrieval model, and generation model [10]. Group 4: Query Understanding - The query understanding module aims to improve retrieval by interpreting user queries and generating structured queries [14]. - Intent recognition helps select relevant modules based on user queries [15]. - Query rewriting utilizes LLM to rephrase user queries for better retrieval [16]. - Query expansion breaks complex questions into simpler sub-questions for more effective retrieval [22]. Group 5: Retrieval Model - The retrieval model's effectiveness depends on the accuracy of embedding models [33]. - Document loaders facilitate loading document data from various sources [38]. - Text converters prepare documents for retrieval by segmenting them into smaller, semantically meaningful chunks [39]. - Document embedding models create vector representations of text to enable semantic searches [45]. - Vector databases support efficient storage and search of embedded data [47]. Group 6: Generation Model - The generation model utilizes retrieved information to generate coherent responses to user queries [60]. - Different strategies for prompt assembly are employed to enhance response generation [62][63]. Group 7: Attribution Generation - Attribution in RAG is crucial for aligning generated content with reference information, ensuring accuracy [73]. - Dynamic computation methods can enhance the generation process by matching generated text with reference sources [76]. Group 8: Evaluation - The article emphasizes the importance of defining metrics and evaluation methods for assessing RAG system performance [79]. - Various evaluation frameworks, such as RGB and RAGAS, are introduced to benchmark RAG systems [81]. Group 9: Conclusion - The article summarizes key modules in RAG practice and highlights the need for continuous research and development to refine these technologies [82].
忘掉《Her》吧,《记忆碎片》才是 LLM Agent 的必修课
Founder Park· 2025-07-29 08:05
Core Insights - The article discusses the evolution of AI from chatbots to agents, highlighting a significant shift in focus towards task decomposition, tool utilization, and autonomous planning as of 2025 [4][5] - It draws parallels between the character Leonard from the film "Memento" and the concept of AI agents, emphasizing the importance of context engineering in enabling agents to function effectively in complex environments [5][10] Context Engineering - Context engineering is defined as a comprehensive technology stack designed to manage information input and output around the limited attention span of large language models (LLMs) [5][13] - The goal of context engineering is to provide agents with the right information at each decision point, which is crucial for their success [5] Three Pillars of Context Engineering - **External Knowledge Management**: This pillar involves a memory extension module that helps agents overcome short-term memory limitations by providing necessary historical information at decision points [19][20] - **Context Distillation & Structuring**: This pillar focuses on processing and filtering information to extract essential facts, ensuring that agents do not become overwhelmed by excessive data [21][25] - **Hierarchical Memory Management**: This pillar emphasizes the need for a layered memory architecture, allowing agents to maintain focus on their core mission while managing dynamic task-related information [26][30] Challenges in Agent Design - The article identifies two critical vulnerabilities in agent design: context poisoning, where agents may process misleading information, and self-reinforcing cognitive prisons, where agents may rely on their own flawed conclusions [32][34] - It stresses the importance of incorporating a verification and reflection module to mitigate these risks, enabling agents to compare outcomes with expected goals and adjust accordingly [35][36]
梳理了1400篇研究论文,整理了一份全面的上下文工程指南 | Jinqiu Select
锦秋集· 2025-07-21 14:03
Core Insights - The article discusses the emerging field of Context Engineering, emphasizing the need for a systematic theoretical framework to complement practical experiences shared by Manus' team [1][2] - A comprehensive survey titled "A Survey of Context Engineering for Large Language Models" has been published, analyzing over 1400 research papers to establish a complete technical system for Context Engineering [1][2] Context Engineering Components - Context Engineering is built on three interrelated components: Information Retrieval and Generation, Information Processing, and Information Management, forming a complete framework for optimizing context in large models [2] - The first component, Context Retrieval and Generation, focuses on engineering methods to effectively acquire and construct context information for models, including practices like Prompt Engineering, external knowledge retrieval, and dynamic context assembly [2] Prompting Techniques - Prompting serves as the starting point for model interaction, where effective prompts can unlock deeper capabilities of the model [3] - Zero-shot prompting provides direct instructions relying on pre-trained knowledge, while few-shot prompting offers a few examples to guide the model in understanding task requirements [4] Advanced Reasoning Frameworks - For complex tasks, structured thinking is necessary, with Chain-of-Thought (CoT) prompting models to think step-by-step, significantly improving accuracy in complex tasks [5] - Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT) further enhance reasoning by allowing exploration of multiple paths and dependencies, improving success rates in tasks requiring extensive exploration [5] Self-Refinement Mechanisms - Self-Refinement allows models to iteratively improve their outputs through self-feedback without requiring additional supervised training data [8][9] - Techniques like N-CRITICS and Agent-R enable models to evaluate and correct their reasoning paths in real-time, enhancing output quality [10][11] External Knowledge Retrieval - External knowledge retrieval, particularly through Retrieval-Augmented Generation (RAG), addresses the static nature of model knowledge by integrating dynamic information from external databases [12][13] - Advanced RAG architectures introduce adaptive retrieval mechanisms and hierarchical processing strategies to enhance information retrieval efficiency [14][15] Context Processing Challenges - Processing long contexts presents significant computational challenges due to the quadratic complexity of Transformer self-attention mechanisms [28] - Innovations like State Space Models and Linear Attention aim to reduce computational complexity, allowing models to handle longer sequences more efficiently [29][30] Context Management Strategies - Effective context management is crucial for organizing, storing, and utilizing information, addressing issues like context overflow and collapse [46][47] - Memory architectures inspired by operating systems and cognitive models are being developed to enhance the memory capabilities of language models [48][50] Tool-Integrated Reasoning - Tool-Integrated Reasoning transforms language models from passive text generators into active agents capable of interacting with the external world through function calling and integrated reasoning frameworks [91][92]