Workflow
Vector Database
icon
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-12 07:25
How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works in practice.> Ingestion: Turning raw data into searchable knowledge- Your documents live in S3 or any internal data source.- Whenever something new is added, a Lambda ingestion function kicks in.- It cleans, processes, and chunks the file i ...
X @Avi Chawla
Avi Chawla· 2026-01-25 19:31
RT Avi Chawla (@_avichawla)Vector Index vs Vector Database, clearly explained!Devs typically use these terms interchangeably.But understanding this distinction is necessary since it leads to problems down the line.Here's how to think about it:A vector index is basically a search algorithm.You give it vectors, it organizes them into something searchable (like HNSW), and it finds similar items fast. FAISS is another example.But here's the thing.That's all it does. It doesn't handle storage, it doesn't filter ...
X @Avi Chawla
Avi Chawla· 2026-01-25 11:51
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/lgCr54DUEVAvi Chawla (@_avichawla):Vector Index vs Vector Database, clearly explained!Devs typically use these terms interchangeably.But understanding this distinction is necessary since it leads to problems down the line.Here's how to think about it:A vector index is basically a search algorithm.You https://t.co/wI5HOVlUyt ...
X @Avi Chawla
Avi Chawla· 2026-01-25 06:31
Vector Index vs Vector Database, clearly explained!Devs typically use these terms interchangeably.But understanding this distinction is necessary since it leads to problems down the line.Here's how to think about it:A vector index is basically a search algorithm.You give it vectors, it organizes them into something searchable (like HNSW), and it finds similar items fast. FAISS is another example.But here's the thing.That's all it does. It doesn't handle storage, it doesn't filter by metadata, and it doesn't ...
X @Avi Chawla
Avi Chawla· 2026-01-23 18:30
RT Avi Chawla (@_avichawla)Researchers built a new RAG approach that:- does not need a vector DB.- does not embed data.- involves no chunking.- performs no similarity search.And it hit 98.7% accuracy on a financial benchmark (SOTA).Here's the core problem with RAG that this new approach solves:Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity.But similarity ≠ relevance.When you ask "What were the debt trends in 2023?", a vector search returns chunks that ...
KIOXIA AiSAQ™ Technology Integrated into Milvus Vector Database
Businesswire· 2025-12-17 02:51
Core Insights - Kioxia Corporation has integrated its KIOXIA AiSAQ™ into the open-source vector database Milvus starting with version 2.6.4 [1] Company Summary - Kioxia Corporation is enhancing its product offerings by integrating KIOXIA AiSAQ™ into Milvus, which is a significant step in expanding its capabilities in the database sector [1]
X @Avi Chawla
Avi Chawla· 2025-11-15 19:12
RAG System Architecture on AWS - RAG (Retrieval Augmented Generation) is a two-stage pattern involving ingestion and querying [1][3] - The architecture remains consistent even with enhancements like better chunking, smarter retrieval, caching, orchestration, streaming, eval pipelines, or multi-source ingestion [3] Ingestion Process - Raw data from sources like S3 is processed and chunked by a Lambda function [3] - Chunks are embedded using Bedrock Titan Embeddings [3] - Embeddings are stored in a vector database such as OpenSearch Serverless, DynamoDB, or Aurora, creating a searchable knowledge store [3] - Strategic reindexing with smart diffing, incremental updates, and metadata checks is important for efficiency and cost reduction [2] Querying Process - User questions are processed through API Gateway to a Lambda function [4] - Questions are embedded using Bedrock Titan Embeddings [4] - The embedding is matched against the vector database to retrieve relevant chunks [4] - Retrieved chunks are passed to a Bedrock LLM (Claude or OpenAI) to generate the final answer [4]
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]
X @Avi Chawla
Avi Chawla· 2025-10-12 06:31
Core Innovation - REFRAG - Meta's REFRAG fundamentally rethinks retrieval in RAG setups by compressing and filtering context at a vector level [1] - REFRAG compresses each chunk into a single compressed embedding and uses a relevance policy trained via RL to select the most relevant chunks [1][2] - Only selected chunks are expanded back into full embeddings and passed to the LLM, processing only what matters [2] Technical Details - REFRAG encodes documents and stores them in a vector database [2] - It encodes the full user query, finds relevant chunks, and computes token-level embeddings for both [3] - A relevance policy, trained via RL, selects chunks to keep [3][5] - Token-level representations of the input query are concatenated with selected chunks and a compressed single-vector representation of rejected chunks before being sent to the LLM [3] Performance Metrics - REFRAG outperforms LLaMA on 16 RAG benchmarks [4][6] - It achieves 30.85x faster time-to-first-token, which is 3.75x better than previous state-of-the-art [4][6] - REFRAG handles 16x larger context windows [4][6] - It utilizes 2-4x fewer tokens [4][6] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [6]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Embedding Models - A new embedding model reduces vector DB costs by approximately 200x [1] - The new model outperforms OpenAI and Cohere models [1] Industry Focus - The report provides a complete breakdown with visuals on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) [1]