上下文学习

Search documents
万字长文!RAG实战全解析:一年探索之路
自动驾驶之心· 2025-08-07 09:52
Core Viewpoint - The article discusses the Retrieval Augmented Generation (RAG) method, which combines retrieval-based models and generative models to enhance the quality and relevance of generated text. It addresses issues such as hallucination, knowledge timeliness, and long text processing in large models [1]. Group 1: Background and Challenges - RAG was proposed by Meta in 2020 to enable language models to access external information beyond their internal knowledge [1]. - RAG faces three main challenges: retrieval quality, enhancement process, and generation quality [2]. Group 2: Challenges in Retrieval Quality - Semantic ambiguity can arise from vector representations, leading to irrelevant results [5]. - User input has become more complex, transitioning from keywords to natural dialogue, which complicates retrieval [5]. - Document segmentation methods can affect the matching degree between document blocks and user queries [5]. - Extracting and representing multimodal content (e.g., tables, charts) poses significant challenges [5]. - Integrating context from retrieved paragraphs into the current generation task is crucial for coherence [5]. - Redundancy and repetition in retrieved content can lead to duplicated information in generated outputs [5]. - Determining the importance of multiple retrieved paragraphs for the generation task is challenging [5]. - Over-reliance on retrieval content can exacerbate hallucination issues [5]. - Irrelevance of generated answers to the query is a concern [5]. - Toxicity or bias in generated answers is another issue [5]. Group 3: Overall Architecture - The product architecture consists of four layers, including model layer, offline understanding layer, online Q&A layer, and scenario layer [7]. - The RAG framework is divided into three main components: query understanding, retrieval model, and generation model [10]. Group 4: Query Understanding - The query understanding module aims to improve retrieval by interpreting user queries and generating structured queries [14]. - Intent recognition helps select relevant modules based on user queries [15]. - Query rewriting utilizes LLM to rephrase user queries for better retrieval [16]. - Query expansion breaks complex questions into simpler sub-questions for more effective retrieval [22]. Group 5: Retrieval Model - The retrieval model's effectiveness depends on the accuracy of embedding models [33]. - Document loaders facilitate loading document data from various sources [38]. - Text converters prepare documents for retrieval by segmenting them into smaller, semantically meaningful chunks [39]. - Document embedding models create vector representations of text to enable semantic searches [45]. - Vector databases support efficient storage and search of embedded data [47]. Group 6: Generation Model - The generation model utilizes retrieved information to generate coherent responses to user queries [60]. - Different strategies for prompt assembly are employed to enhance response generation [62][63]. Group 7: Attribution Generation - Attribution in RAG is crucial for aligning generated content with reference information, ensuring accuracy [73]. - Dynamic computation methods can enhance the generation process by matching generated text with reference sources [76]. Group 8: Evaluation - The article emphasizes the importance of defining metrics and evaluation methods for assessing RAG system performance [79]. - Various evaluation frameworks, such as RGB and RAGAS, are introduced to benchmark RAG systems [81]. Group 9: Conclusion - The article summarizes key modules in RAG practice and highlights the need for continuous research and development to refine these technologies [82].
不靠海量数据,如何精准喂养大模型?上交Data Whisperer:免训练数据选择法,10%数据逼近全量效果
机器之心· 2025-07-29 06:38
Core Viewpoint - The article introduces "Data Whisperer," a novel framework for efficient data selection in fine-tuning large language models (LLMs) without the need for additional training, achieving near-optimal performance with only 10% of the data compared to full datasets [2][4][36]. Group 1: Methodology and Mechanism - Data Whisperer utilizes the in-context learning (ICL) capabilities of pre-trained models to select "golden training samples" without requiring a scoring model [2][6]. - The framework employs a scoring mechanism based on the model's own outputs and attention weights, ensuring a stable and reasonable selection process [10][12]. - It introduces a new efficiency metric, Selection-to-Tuning Ratio (STR), which shows that Data Whisperer significantly outperforms traditional methods in terms of time efficiency [17][18]. Group 2: Performance Metrics - In various tasks, Data Whisperer achieved impressive results, such as 72.46% accuracy on the GSM8K dataset using only 10% of the data, surpassing the full dataset performance of 71.39% [19]. - The framework also demonstrated superior performance in the DialogSum and BioInstruct tasks, with notable improvements over existing state-of-the-art methods [19][21]. Group 3: Robustness and Adaptability - Data Whisperer shows robustness in input scale, with optimal configurations identified for the number of demonstration and query samples, indicating that it effectively selects core samples rather than relying on sheer volume [26][28]. - The framework supports a weak-to-strong mechanism, allowing smaller models to select tasks for larger models, thus reducing computational burdens while maintaining performance [22][24]. Group 4: Comparative Analysis - Data Whisperer outperforms all mainstream data selection methods across accuracy, efficiency, and stability, particularly in low-budget scenarios [35]. - The framework's theoretical foundation is based on the relationship between ICL and fine-tuning, allowing it to effectively pre-train for training efficiency without adjusting model parameters [36][37]. Group 5: Future Directions - Potential future explorations include applying the method to complex structured tasks in fields like law and medicine, enhancing task alignment capabilities, and integrating human feedback [41][42].
一个任务50次调用,成本狂砍90%?Manus首次公开上下文工程秘诀,一堆反复重写换来的教训
AI前线· 2025-07-21 07:04
Core Insights - The article emphasizes the importance of context engineering in developing AI agents, highlighting the need for rapid iteration and improvement in response to evolving models and technologies [1][2]. Group 1: KV Cache Design - KV cache hit rate is identified as the most critical metric for AI agents in production, directly impacting latency and cost [4]. - The average input to output token ratio in Manus is approximately 100:1, which significantly benefits from KV caching, reducing the cost of cached input tokens to $0.30 per MTok compared to $3 per MTok for uncached tokens [5]. - Key practices to improve KV cache hit rate include maintaining stable prompt prefixes, appending content only, and marking cache breakpoints explicitly [8][9][10]. Group 2: Tool Management - As agents develop more capabilities, the complexity of the action space increases, leading to potential inefficiencies if tools are dynamically added or removed during iterations [11][14]. - Manus employs a context-aware state machine to manage tool availability without removing tools, thus preventing confusion and maintaining KV cache integrity [14][15][16]. Group 3: Context as a File System - The article discusses the limitations of context windows in modern large language models, suggesting that a file system can serve as an infinite context, allowing agents to read and write files as structured external memory [21]. - Manus implements a recoverable compression strategy, retaining essential information like URLs while allowing for context length reduction [24]. Group 4: Attention Manipulation - Manus uses a "todo.md" file to keep track of tasks, which helps maintain focus and avoid losing sight of goals during complex tasks [26][30]. - Retaining errors in the context is proposed as a method to improve agent behavior, allowing the model to learn from mistakes and reduce the likelihood of repeating them [32][35]. Group 5: Sample Diversity - The article warns against the pitfalls of few-shot prompting in agent systems, which can lead to repetitive and suboptimal actions [36]. - Introducing structured variations in actions and observations can help break patterns and adjust the model's attention, enhancing overall performance [37][38]. Group 6: Conclusion - Context engineering is deemed essential for AI agents, influencing their speed, recovery capabilities, and scalability [39]. - The future of agents will focus on constructing context effectively, underscoring the importance of thoughtful design [40].