Workflow
RAG (Retrieval-Augmented Generation)
icon
Search documents
One-Click Enterprise RAG Pipeline with DDN Infinia & NVIDIA NeMo | High-Performance AI Data Solution
DDN· 2025-08-08 16:27
Today we'll be showing how DDN enables a one-click high-performance rag pipeline for enterprise use. Our rag pipeline solution is enterprise class and easy to deploy for use in any environment whether NCP cloud, hyperscaler cloud, and of course on prem. Let's take a closer look at the architecture.This rag pipeline solution is made of several NVIDIA NIM within the NVIDIA Nemo framework. These NIM host embedding, reranking and LLM models and a MILV vector database. We have a front-end chatbot for input, outp ...
When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer
AI Engineer· 2025-07-22 16:30
Enterprise knowledge bases are filled with "dense mapping," thousands of documents where similar terms appear repeatedly, causing traditional vector retrieval to return the wrong version or irrelevant information. When our customers kept hitting this wall with their RAG systems, we knew we needed a fundamentally different approach. In this talk, I'll share Writer's journey developing a graph-based RAG architecture that achieved 86.31% accuracy on the RobustQA benchmark while maintaining sub-second response ...
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]
X @Avi Chawla
Avi Chawla· 2025-06-17 19:11
I built an MCP-powered RAG over video files.You can:- Ingest a video file.- Ask questions to get natural language responses.- Extract precise timestamps that answer the question.Check the explainer thread below: https://t.co/hy5aVCQj9pAvi Chawla (@_avichawla):Let's build an MCP-powered RAG over videos, step-by-step: ...