X @Avi Chawla - Reportify

RAG System Architecture on AWS - RAG (Retrieval Augmented Generation) is a two-stage pattern involving ingestion and querying [1][3] - The architecture remains consistent even with enhancements like better chunking, smarter retrieval, caching, orchestration, streaming, eval pipelines, or multi-source ingestion [3] Ingestion Process - Raw data from sources like S3 is processed and chunked by a Lambda function [3] - Chunks are embedded using Bedrock Titan Embeddings [3] - Embeddings are stored in a vector database such as OpenSearch Serverless, DynamoDB, or Aurora, creating a searchable knowledge store [3] - Strategic reindexing with smart diffing, incremental updates, and metadata checks is important for efficiency and cost reduction [2] Querying Process - User questions are processed through API Gateway to a Lambda function [4] - Questions are embedded using Bedrock Titan Embeddings [4] - The embedding is matched against the vector database to retrieve relevant chunks [4] - Retrieved chunks are passed to a Bedrock LLM (Claude or OpenAI) to generate the final answer [4]