RAG (Retrieval-Augmented Generation)
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-12 07:25
How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works in practice.> Ingestion: Turning raw data into searchable knowledge- Your documents live in S3 or any internal data source.- Whenever something new is added, a Lambda ingestion function kicks in.- It cleans, processes, and chunks the file i ...
X @Avi Chawla
Avi Chawla· 2026-02-27 06:33
8 RAG architectures for AI Engineers:(explained with usage)1) Naive RAG- Retrieves documents purely based on vector similarity between the query embedding and stored embeddings.- Works best for simple, fact-based queries where direct semantic matching suffices.2) Multimodal RAG- Handles multiple data types (text, images, audio, etc.) by embedding and retrieving across modalities.- Ideal for cross-modal retrieval tasks like answering a text query with both text and image context.3) HyDE (Hypothetical Documen ...
X @Avi Chawla
Avi Chawla· 2026-02-23 09:37
The open-source RAG stack (2025): https://t.co/1uyW98OH9S ...
X @Avi Chawla
Avi Chawla· 2026-01-23 18:30
RT Avi Chawla (@_avichawla)Researchers built a new RAG approach that:- does not need a vector DB.- does not embed data.- involves no chunking.- performs no similarity search.And it hit 98.7% accuracy on a financial benchmark (SOTA).Here's the core problem with RAG that this new approach solves:Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity.But similarity ≠ relevance.When you ask "What were the debt trends in 2023?", a vector search returns chunks that ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-12-23 04:06
Performance Highlights - Grok 4.1 demonstrates superior RAG (Retrieval-Augmented Generation) performance, especially with lengthy and intricate documents where precision is crucial [1] - Grok 4.1 either matches or surpasses industry leaders in certain tasks [1] Benchmark Results - Grok 4.1 achieved a coding score of 86, the best overall [2] - Grok 4.1 achieved a finance score of 93.0, demonstrating a clear lead [2] - Grok 4.1 is competitive with leading models in lawful tasks [2]
X @Avi Chawla
Avi Chawla· 2025-11-15 19:12
RAG System Architecture on AWS - RAG (Retrieval Augmented Generation) is a two-stage pattern involving ingestion and querying [1][3] - The architecture remains consistent even with enhancements like better chunking, smarter retrieval, caching, orchestration, streaming, eval pipelines, or multi-source ingestion [3] Ingestion Process - Raw data from sources like S3 is processed and chunked by a Lambda function [3] - Chunks are embedded using Bedrock Titan Embeddings [3] - Embeddings are stored in a vector database such as OpenSearch Serverless, DynamoDB, or Aurora, creating a searchable knowledge store [3] - Strategic reindexing with smart diffing, incremental updates, and metadata checks is important for efficiency and cost reduction [2] Querying Process - User questions are processed through API Gateway to a Lambda function [4] - Questions are embedded using Bedrock Titan Embeddings [4] - The embedding is matched against the vector database to retrieve relevant chunks [4] - Retrieved chunks are passed to a Bedrock LLM (Claude or OpenAI) to generate the final answer [4]
DDN One-Click RAG Pipeline Demo: DDN Infinia & NVIDA NIMs
DDN· 2025-11-11 18:56
Welcome to this demonstration. Today we'll be showing how DDN enables a one-click high-performance rag pipeline for enterprise use. Our rag pipeline solution is enterprise class and easy to deploy and use in any cloud environment whether AWS, GCP, Azure, any NCP cloud and of course on prem.Let's take a closer look at the architecture. This rag pipeline solution is made of several NVIDIA Nemo NIMS or NVIDIA inference microservices which host embedding reranking LLM models a milild vector database a front-end ...
X @Avi Chawla
Avi Chawla· 2025-10-21 19:09
RT Avi Chawla (@_avichawla)The open-source RAG stack (2025): https://t.co/jhDZXzOPzC ...
X @Avi Chawla
Avi Chawla· 2025-10-21 06:30
Technology Trends - The report focuses on the open-source RAG (Retrieval-Augmented Generation) stack in 2025 [1]
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]