推理密集型信息检索
Search documents
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
3 6 Ke· 2025-08-27 13:04
Core Insights - The article discusses the emergence of BGE-Reasoner, a significant advancement in Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions, addressing a critical challenge in AI and RAG technologies [1][2]. Group 1: BGE-Reasoner Overview - BGE-Reasoner is an innovative end-to-end solution for reasoning-intensive information retrieval tasks, significantly improving search engine performance in this area [1]. - It achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and outperforming submissions from major institutions like Ant Group and Baidu by a margin of 3.6 points [5][7]. - The model's architecture includes a three-stage modular framework consisting of Rewriter, Embedder, and Reranker, designed to handle complex queries efficiently [6][10]. Group 2: Technical Innovations - The core innovations of BGE-Reasoner include a replicable framework for complex query processing, data-driven approaches to generate high-quality training data, and the application of reinforcement learning to enhance model performance [6][12]. - The model utilizes synthetic data generated from large language models to address the scarcity of training data in reasoning-intensive retrieval scenarios, covering multiple domains such as mathematics and coding [10][11]. - The BGE-Reasoner-Embed and BGE-Reasoner-Reranker components are fine-tuned to improve retrieval and ranking capabilities, demonstrating superior performance in the BRIGHT benchmark [11][12]. Group 3: Future Directions - The success of BGE-Reasoner highlights the importance of reinforcement learning and synthetic data in advancing reasoning-intensive information retrieval, paving the way for future developments in Agent Search [14]. - The research team aims to continue enhancing the capabilities and versatility of the BGE series models while fostering collaborations with other research institutions and industry partners [14].
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
机器之心· 2025-08-27 08:36
Core Viewpoint - The article discusses the emergence of BGE-Reasoner, an innovative end-to-end solution for Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions. This solution addresses a critical bottleneck in the development of RAG and AI agents, significantly enhancing their performance in complex reasoning tasks [2][3]. Group 1: BGE-Reasoner Overview - BGE-Reasoner achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and demonstrating its effectiveness in reasoning-intensive retrieval tasks [2][12]. - The model represents a significant milestone in the BGE series, providing a new paradigm for tackling industry challenges related to reasoning-intensive retrieval [3]. Group 2: Technical Innovations - A replicable framework consisting of three modular components: Rewriter, Embedder, and Reranker, was proposed to efficiently handle complex queries [3]. - The research team explored the feasibility of synthesizing high-quality, multi-domain reasoning training data using large models, addressing the critical issue of data scarcity in this field [4]. - Reinforcement learning was successfully applied to the Reranker training, enhancing the model's reasoning and generalization capabilities when faced with challenging samples [5]. Group 3: Performance Comparison - BGE-Reasoner outperformed submissions from major institutions such as Ant Group, Baidu, and ByteDance, leading the BRIGHT leaderboard by a margin of 3.6 points [12][14]. - The embedded vector model, BGE-Reasoner-Embed, also demonstrated superior performance compared to other leading baseline models, confirming the effectiveness of the synthesized training data [12][22]. Group 4: System Workflow - The BGE-Reasoner system follows a classic three-module structure: the original query is rewritten, candidates are retrieved using the Embedder, and final results are ranked by the Reranker [19][24]. - The query understanding module utilizes synthesized data to generate reasoning paths, significantly improving the model's query understanding and rewriting capabilities [21]. - The embedded vector model and the Reranker are fine-tuned based on high-quality synthetic training data, enhancing their performance in reasoning-intensive retrieval tasks [22][24]. Group 5: Future Directions - The research team aims to continue advancing vector models and retrieval enhancement technologies, collaborating with more research institutions and industry partners to promote the development of retrieval and artificial intelligence [25].