Revela
Search documents
ICLR 2026 Oral | Revela:用语言建模重新定义稠密检索器训练
机器之心· 2026-03-26 11:41
Core Insights - The article discusses the development of Revela, a new approach to training dense retrievers within retrieval-augmented generation (RAG) systems, which has won prestigious awards for its innovative methodology [2][24]. Group 1: Challenges in Dense Retriever Training - Training high-quality dense retrievers is challenging due to reliance on manually annotated data, which is costly in specialized fields like law and code [4]. - The difficulty of negative sample mining introduces additional complexity, as random negative samples provide weak signals [4]. - There is a disconnect between contrastive loss and mainstream language model pre-training objectives, making it hard to leverage pre-trained knowledge effectively [4]. Group 2: Revela's Approach - Revela unifies the training objective of dense retrievers under a language modeling framework, allowing for a more natural training path [6]. - It introduces an in-batch attention mechanism that dynamically references other relevant documents during the prediction of the next token, enhancing the similarity scores between text chunks [6][13]. - The architecture consists of a retriever for encoding text and calculating similarity, and a language model providing training signals, both optimized together [10]. Group 3: Advantages of Revela - The training objective aligns closely with language modeling, activating existing semantic understanding capabilities in pre-trained models [11]. - It is fully self-supervised, significantly reducing the need for manual annotations, which is advantageous in data-scarce professional domains [11]. - Revela demonstrates strong scalability, with performance improving as the retriever size and batch size increase [11]. Group 4: Experimental Results - In code retrieval (CoIR), Revela-3B achieved an average nDCG@10 of 60.1, surpassing several supervised models trained on large annotated datasets [18]. - In reasoning-intensive retrieval (BRIGHT), Revela-3B outperformed commercial APIs, achieving an average nDCG@10 of 20.1 with only Wikipedia text for training [21]. - For general retrieval (BEIR), Revela-3B matched the performance of a weakly supervised baseline while using significantly less training data and resources [22]. Group 5: Future Directions - Revela opens avenues for dynamic index construction, which could enhance semantic relevance in batch processing but poses computational challenges [24]. - There is potential for further model and data expansion, which could lead to performance improvements [24]. - The insights gained from the retriever could also inform improvements in language model training, suggesting a reciprocal enhancement potential [24].