检索增强生成(RAG)
Search documents
检索做大,生成做轻:CMU团队系统评测RAG的语料与模型权衡
机器之心· 2026-01-06 00:31
Core Insights - The core argument of the research is that expanding the retrieval corpus can significantly enhance Retrieval-Augmented Generation (RAG) performance, often providing benefits that can partially substitute for increasing model parameters, although diminishing returns occur at larger corpus sizes [4][22]. Group 1: Research Findings - The study reveals that the performance of RAG is determined by both the retrieval module, which provides evidence, and the generation model, which interprets the question and integrates evidence to form an answer [7]. - The research indicates that smaller models can achieve performance levels comparable to larger models by increasing the retrieval corpus size, with a consistent pattern observed across multiple datasets [11][12]. - The findings show that the most significant performance gains occur when moving from no retrieval to having retrieval, with diminishing returns as the corpus size increases [13]. Group 2: Experimental Design - The research employed a full factorial design, varying only the corpus size and model size while keeping other variables constant, using a large dataset of approximately 264 million real web documents [9]. - The evaluation covered three open-domain question-answering benchmarks: Natural Questions, TriviaQA, and Web Questions, using common metrics such as F1 and ExactMatch [9]. Group 3: Mechanisms of Improvement - The increase in corpus size enhances the probability of retrieving answer-containing segments, leading to more reliable evidence for the generation model [16]. - The study defines the Gold Answer Coverage Rate, which measures the probability that at least one of the top chunks provided to the generation model contains the correct answer string, showing a monotonic increase with corpus size [16]. Group 4: Practical Implications - The research suggests that when resources are constrained, prioritizing the expansion of the retrieval corpus and improving coverage can allow medium-sized generation models to perform close to larger models [20]. - The study emphasizes the importance of tracking answer coverage and utilization rates as diagnostic metrics to identify whether bottlenecks are in the retrieval or generation components [20].
系统学习Deep Research,这一篇综述就够了
机器之心· 2026-01-01 04:33
近年来,大模型的应用正从对话与创意写作,走向更加开放、复杂的研究型问题。尽管以检索增强生成(RAG)为代表的方法缓解了知识获取瓶颈,但其静态的 "一次检索 + 一次生成" 范式,难以支撑多步推理与长期研究流程,由此催生了 Deep Research(DR)这一新方向。 然而,随着相关工作的快速涌现,DR的概念也在迅速膨胀并趋于碎片化:不同工作在系统实现、任务假设与评价上差异显著;相似术语的使用进一步模糊了其能 力边界。 正是在这一背景下,来自山东大学、清华大学、CMU、UIUC、腾讯等机构共同撰写并发布了目前最全面的深度研究智能体综述《Deep Research: A Systematic Survey》。文章首先提出一条由浅入深的三阶段能力发展路径,随后从系统视角系统化梳理关键组件,并进一步总结了对应的训练与优化方法。 什么是 Deep Research DR 并非某一具体模型或技术,而是一条逐步演进的能力路径。综述刻画了研究型智能体从信息获取到完整科研流程的能力提升过程。基于对现有工作的梳理,可 将这一演进划分为三个阶段。 阶段 1:「Agentic Search」。模型开始具备主动搜索与多步信息获取能力 ...
2025年AI大模型资料汇编
Sou Hu Cai Jing· 2025-12-24 10:45
2025 年 AI 大模型行业迎来结构性变革,竞争从单纯的能力竞赛转向可持续性比拼,技术范式、市场格局、应用形态与全球治理四大维度的深刻转变,共同 重塑行业发展轨迹。 技术层面实现多重突破性演进。训练范式从依赖主观反馈的 RLHF 全面转向客观可验证的 RLVR,模型通过自我检验实现推理能力飞跃,成为年度最关键技 术拐点。混合专家(MoE)架构强势回潮,以稀疏激活模式平衡参数规模与计算成本,实现性价比极致追求。多智能体自我博弈与合成数据微调成为常态, 模型摆脱对人类标注的依赖,同时检索增强生成(RAG)成为企业级应用标配,有效解决幻觉与知识时效性问题。此外,模型呈现 "锯齿化" 能力结构,在 数学、编程等形式化智力领域突飞猛进,却在常识推理上仍存短板。 新王登基: 京 谷歌Gemini 3全面 国模型以惊人的成 市场格局呈现集中化与民主化双重张力。谷歌 Gemini 3 凭借自研 TPU v5 芯片与多模态优势,终结 OpenAI 长期领先地位,中国模型以成本效益实现弯道超 车。市场向头部集中,Anthropic 等顶尖初创企业获巨额融资,二三线玩家面临出清,但开源浪潮形成制衡,阿里通义千问、01.ai Yi ...
AI智能体时代中的记忆:形式、功能与动态综述
Xin Lang Cai Jing· 2025-12-17 04:42
记忆已成为并将继续成为基于基础模型的智能体的核心能力。它支撑着长程推理、持续适应以及与复杂环境的有效交互。随着智能体记忆研究的快速扩张 并吸引空前关注,该领域也日益呈现碎片化。当前统称为"智能体记忆"的研究工作,在动机、实现、假设和评估方案上往往存在巨大差异,而定义松散的 记忆术语的激增进一步模糊了概念上的清晰度。诸如长/短期记忆之类的传统分类法已被证明不足以捕捉当代智能体记忆系统的多样性和动态性。 在这些智能体的核心能力中,记忆 尤为关键,它明确地促成了从静态大语言模型(其参数无法快速更新)到自适应智能体的转变,使其能够通过环境交 互持续适应(Zhang et al., 2025r; Wu et al., 2025g)。从应用角度看,许多领域都要求智能体具备主动的记忆管理能力,而非短暂、易忘的行为:个性化聊 天机器人(Chhikara et al., 2025; Li et al., 2025b)、推荐系统(Liu et al., 2025b)、社会模拟(Park et al., 2023; Yang et al., 2025)以及金融调查(Zhang et al., 2024)都依赖于智能体处理、存储和管 ...
恒生电子助力国元证券打造智能知识中心 大模型赋能知识管理与高效应用
Zheng Quan Ri Bao Zhi Sheng· 2025-12-11 13:38
本报讯 (记者矫月)近日,恒生电子股份有限公司(以下简称"恒生电子")助力国元证券股份有限公 司(以下简称"国元证券")成功上线智能知识中心,通过引入大模型与检索增强生成(RAG)等前沿技 术,实现统一知识管理、实时知识更新、智能化知识问答等一体化服务,提升券商业务人员知识检索效 率和业务场景的问答精准度。 此前,国元证券的知识资产分散于多个独立业务系统中形成"信息孤岛"。员工进行知识检索需要跨系统 反复校验,合规核查依赖人工筛查,效率低下且易出现偏差。大模型在自然语言处理领域的强大能力, 使其不仅能够处理海量文本数据,还能通过深度学习的方法自动提取文本中的关键信息和特征。此外, RAG技术通过将信息检索与文本生成相结合,能够进一步提高模型在特定任务中的准确性和效率。 目前,智能知识中心已在国元证券20多个部门使用,有效解决了长期以来知识管理痛点,精准契合公司 以技术赋能业务、推动高质量发展的核心布局。通过统一知识入口、精细化权限治理与闭环运营机制, 平台让各个部门长久以来沉淀的海量知识资产真正"活"了起来,实现了"降本、增效、控险"的建设目 标,为国元证券未来的业务创新、风险管控和组织提效构筑了高质量的数字 ...
迎接「万物皆可RAG」时代:最新综述展示50多种多模态组合的巨大待探索空间
机器之心· 2025-12-02 09:18
Core Insights - The article discusses the emergence of Multimodal Retrieval-Augmented Generation (MM-RAG) as a new field, highlighting its potential applications and the current state of research, which is still in its infancy [2][5][17] - A comprehensive survey published by researchers from Huazhong University of Science and Technology, Fudan University, China Telecom, and the University of Illinois at Chicago covers nearly all possible combinations of modalities for input and output in MM-RAG [4][17] Summary by Sections Overview of MM-RAG - MM-RAG is an evolution of traditional Retrieval-Augmented Generation (RAG) that incorporates multiple modalities such as text, images, audio, video, code, tables, knowledge graphs, and 3D objects [2][4] - Current research primarily focuses on limited combinations of modalities, leaving many potential applications unexplored [2][5] Potential Combinations - The authors identify a vast space of potential input-output modality combinations, revealing that out of 54 proposed combinations, only 18 have existing research [5][6] - Notably, combinations like "text + video as input, generating video as output" remain largely untapped [5] Classification Framework - A new classification framework for MM-RAG is established, systematically organizing existing research and clearly presenting the core technical components of different MM-RAG systems [6][15] - This framework serves as a reference for future research and development in the field [6][15] MM-RAG Workflow - The MM-RAG workflow is divided into four key stages: 1. Pre-retrieval: Organizing data and preparing queries [11] 2. Retrieval: Efficiently finding relevant information from a multimodal knowledge base [12] 3. Augmentation: Integrating retrieved multimodal information into the large model [13] 4. Generation: Producing high-quality multimodal outputs based on input and augmented information [14][15] Practical Guidance - The survey provides a one-stop guide for building MM-RAG systems, covering training, evaluation, and application strategies [17][18] - It discusses training methods to maximize retrieval and generation capabilities, summarizes existing evaluation metrics, and explores potential applications across various fields [18]
构建LLM:每个AI项目都需要的知识图谱基础
3 6 Ke· 2025-11-13 00:49
Core Viewpoint - The case involving attorney Steven Schwartz highlights the critical misunderstanding of the capabilities of large language models (LLMs) in legal research, leading to the submission of fabricated court cases and citations [3][4][5]. Group 1: Case Overview - Judge Kevin Castel addressed the submission of six cases by Schwartz, which were later found to be entirely fabricated and non-existent [3][4]. - Schwartz initially believed that LLMs like ChatGPT could serve as reliable legal research tools, equating them to a "super search engine" [4][5]. Group 2: Limitations of LLMs - The case illustrates a fundamental misunderstanding of LLMs' capabilities, particularly in the context of legal research, which requires precise and verifiable information [5][7]. - LLMs are known to produce "hallucinations," or false information, which poses significant risks in fields requiring high accuracy, such as law [5][7][9]. - The architecture of LLMs presents challenges, including lack of transparency, difficulty in updating knowledge, and absence of domain-specific expertise [7][8][9]. Group 3: Knowledge Graphs as a Solution - Knowledge graphs (KGs) are proposed as a solution to enhance the reliability of AI systems by providing structured, verifiable, and up-to-date information [10][12][19]. - KGs support dynamic updates and maintain a clear audit trail, which is essential for accountability in professional environments [12][20]. - The integration of KGs with LLMs can mitigate the risks associated with hallucinations and improve the accuracy of domain-specific applications [19][20]. Group 4: Future of AI in Professional Fields - The future of AI in critical applications, such as legal research, hinges on the development of intelligent advisory systems that combine the strengths of KGs and LLMs [21]. - Professionals deploying AI tools must ensure that their systems support accountability and accuracy, rather than undermine them [21].
东方材料日本子公司发布天財Model-v1.1,千亿参数财税大模型实现“认知式AI”突破
Quan Jing Wang· 2025-10-31 02:29
Core Insights - The launch of Tenzai Model-v1.1 by Dongfang Materials' Japanese subsidiary marks a significant advancement in the application of AI within the finance and taxation sector, transitioning from "execution automation" to "cognitive intelligence" [1][4] Technology Foundation - Tenzai Model-v1.1 is built on a trillion-parameter architecture, utilizing a Transformer model optimized for finance and taxation scenarios, incorporating over 5 million real tax documents, 1 million high-quality Q&A pairs, a 50-year database of Japanese tax laws, and over 100,000 real business cases [1][2] - The model employs Domain-adaptive Continued Pre-training and Multi-task Fine-tuning to achieve near-human cognitive abilities in semantic understanding, logical reasoning, and judgment suggestions [2] Innovative Architecture - The system integrates Retrieval-Augmented Generation (RAG) technology to address potential inaccuracies in professional content, ensuring that every recommendation is backed by legal references and case studies [2] - Tenzai Model-v1.1 features multi-modal understanding, capable of processing images, text, and tabular data, achieving a recognition accuracy of 99.8% for complex documents [2] System Performance - The model supports a context length of up to 32K tokens, with an average response time of under 2 seconds, processing 1,200 documents per hour, significantly outperforming current market solutions [2] - It includes a continuous learning mechanism for monthly updates on tax laws and supports private deployment and flexible SaaS architecture [2] Application Depth - Tenzai Model-v1.1 represents a leap from traditional automation systems, enabling semantic understanding, contextual reasoning, and proactive risk alerts in tax-related queries [2][3] - The system has been integrated with major Japanese accounting software, supporting cloud, private, and hybrid deployments, with plans for a mobile app and international versions by 2026 [3] Industry Impact - The release of Tenzai Model-v1.1 signifies a maturation of vertical large models in professional services, transforming unstructured tax knowledge into computable, inferable, and interactive AI capabilities [4]
中国科学院碳足迹智能核算研究取得进展
Huan Qiu Wang Zi Xun· 2025-10-22 02:51
Core Insights - The article discusses the introduction of Chat-LCA, an intelligent life cycle assessment (LCA) solution that integrates large language models (LLM) to enhance carbon accounting efficiency and accuracy in the context of China's "dual carbon" strategy [1][3]. Group 1: Technology and Innovation - Chat-LCA represents a significant advancement by integrating cutting-edge AI technologies such as retrieval-augmented generation (RAG), Text2SQL, chain of thought (CoT), and code chain (CoC) into the entire LCA process [3]. - The system automates the entire workflow from knowledge acquisition to report generation, effectively breaking down knowledge barriers and data silos [3][4]. Group 2: Performance Metrics - Chat-LCA has demonstrated high accuracy and efficiency, achieving a BERTScore of 0.85 in answering professional questions across ten industries, a Text2SQL execution accuracy of 0.9692 on real LCI databases, and a report generation accuracy of 0.9832 with a readability score of 8.42 out of 10 [4]. - The system can reduce traditional LCA analysis time from weeks to just a few hours, marking a qualitative leap in carbon accounting efficiency [4]. Group 3: Practical Applications - In practical applications, such as assessing the carbon footprint of lithium-sulfur batteries, Chat-LCA identified raw material acquisition (47.2%) and production stages (31.3%) as major carbon emission hotspots, providing targeted emission reduction suggestions like clean energy alternatives [4]. - The solution significantly lowers the technical barriers for carbon accounting and expands the applicability of LCA methods across various industrial and policy scenarios, supporting the realization of "dual carbon" goals with actionable technological and decision-making tools [4].
告别错误累计与噪声干扰,EviNote-RAG 开启 RAG 新范式
机器之心· 2025-09-12 00:51
Core Insights - The article discusses the development of EviNote-RAG, a new framework aimed at enhancing retrieval-augmented generation (RAG) models, addressing issues of low signal-to-noise ratio and error accumulation in complex tasks [4][10][11]. Group 1: EviNote-RAG Framework - EviNote-RAG introduces a three-stage process: retrieval, note-taking, and answering, which contrasts with traditional RAG methods that directly rely on retrieval results [14][22]. - The framework utilizes Supportive-Evidence Notes (SEN) to filter out noise and highlight key information, mimicking human note-taking habits [20][22]. - Evidence Quality Reward (EQR) is incorporated to ensure that the notes genuinely support the final answer, thus reducing shallow matching and error accumulation [20][22]. Group 2: Performance Improvements - EviNote-RAG has shown significant performance improvements across various open-domain question-answering benchmarks, achieving a 20% increase in F1 score on HotpotQA, a 40% increase on Bamboogle, and a 91% increase on 2Wiki [25][24]. - The framework has demonstrated enhanced generalization capabilities and training stability, making it one of the most reliable RAG frameworks available [6][18]. Group 3: Training Dynamics - The introduction of SEN and EQR has transformed the training dynamics from unstable to robust, allowing for a smoother training curve and improved performance [27][28]. - Key findings indicate that structured instructions lead to stability, while noise filtering through SEN significantly enhances computational efficiency [28][29]. Group 4: Experimental Validation - Ablation studies confirm that both SEN and EQR are crucial for robust reasoning, with SEN providing structured constraints and EQR offering logical consistency supervision [41][45]. - The experiments highlight that effective supervision is more about how supportive evidence is organized and marked rather than merely enforcing summaries [42][45].