Workflow
推理密集型信息检索
icon
Search documents
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
3 6 Ke· 2025-08-27 13:04
人工智能的浪潮正将我们推向一个由 RAG 和 AI Agent 定义的新时代。然而,要让这些智能体真正「智能」,而非仅仅是信息的搬运工,就必须攻克一个 横亘在所有顶尖团队面前的核心难题。这个难题,就是推理密集型信息检索(Reasoning-Intensive IR)。 它不仅是当前 RAG 和 AI Agent 技术发展的关键瓶颈,更对大模型智能体和深度研究(DeepResearch)等应用场景的成败具有决定性意义。 正当全球研究者都在为此寻求突破之际,我们看到了一项来自中国的贡献:BGE-Reasoner。 BGE-Reasoner 由来自中国科学技术大学、智源研究院、北京邮电大学与香港理工大学等机构的联合团队研发,是一套用于推理密集型信息检索任务的创 新的端到端解决方案。通过系统性的查询理解、向量检索与重排序,该方案可显著提升搜索引擎在推理密集型信息检索任务中的表现。 在权威评测基准 BRIGHT 上,BGE-Reasoner 取得 45.2 的测试得分,以显著优势刷新了该基准的最佳纪录。 作为 BGE 系列模型的又一重要里程碑,BGE-Reasoner 不仅实现了性能上的突破,更为解决推理密集型检索这一 ...
打破瓶颈,让RAG学会思考:中科大、智源等发布推理检索框架BGE-Reasoner
机器之心· 2025-08-27 08:36
Core Viewpoint - The article discusses the emergence of BGE-Reasoner, an innovative end-to-end solution for Reasoning-Intensive Information Retrieval (IR), developed by a collaborative team from various Chinese institutions. This solution addresses a critical bottleneck in the development of RAG and AI agents, significantly enhancing their performance in complex reasoning tasks [2][3]. Group 1: BGE-Reasoner Overview - BGE-Reasoner achieved a score of 45.2 on the BRIGHT benchmark, surpassing previous records and demonstrating its effectiveness in reasoning-intensive retrieval tasks [2][12]. - The model represents a significant milestone in the BGE series, providing a new paradigm for tackling industry challenges related to reasoning-intensive retrieval [3]. Group 2: Technical Innovations - A replicable framework consisting of three modular components: Rewriter, Embedder, and Reranker, was proposed to efficiently handle complex queries [3]. - The research team explored the feasibility of synthesizing high-quality, multi-domain reasoning training data using large models, addressing the critical issue of data scarcity in this field [4]. - Reinforcement learning was successfully applied to the Reranker training, enhancing the model's reasoning and generalization capabilities when faced with challenging samples [5]. Group 3: Performance Comparison - BGE-Reasoner outperformed submissions from major institutions such as Ant Group, Baidu, and ByteDance, leading the BRIGHT leaderboard by a margin of 3.6 points [12][14]. - The embedded vector model, BGE-Reasoner-Embed, also demonstrated superior performance compared to other leading baseline models, confirming the effectiveness of the synthesized training data [12][22]. Group 4: System Workflow - The BGE-Reasoner system follows a classic three-module structure: the original query is rewritten, candidates are retrieved using the Embedder, and final results are ranked by the Reranker [19][24]. - The query understanding module utilizes synthesized data to generate reasoning paths, significantly improving the model's query understanding and rewriting capabilities [21]. - The embedded vector model and the Reranker are fine-tuned based on high-quality synthetic training data, enhancing their performance in reasoning-intensive retrieval tasks [22][24]. Group 5: Future Directions - The research team aims to continue advancing vector models and retrieval enhancement technologies, collaborating with more research institutions and industry partners to promote the development of retrieval and artificial intelligence [25].