OpenSearch Serverless - filings, earnings calls, financial reports, news

Retrieval Augmented Generation (RAG)

Amazon S3

OpenSearch Serverless

Large Language Models (LLMs)

AWS Lambda

Amazon S3

OpenSearch Serverless

X @Avi Chawla

Avi Chawla· 2026-03-12 07:25

How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works in practice.> Ingestion: Turning raw data into searchable knowledge- Your documents live in S3 or any internal data source.- Whenever something new is added, a Lambda ingestion function kicks in.- It cleans, processes, and chunks the file i ...

LLM (Large Language Model)

Data Ingestion

LLM (Large Language Model)

Data Ingestion

X @Avi Chawla

Avi Chawla· 2025-11-15 19:12

RAG System Architecture on AWS - RAG (Retrieval Augmented Generation) is a two-stage pattern involving ingestion and querying [1][3] - The architecture remains consistent even with enhancements like better chunking, smarter retrieval, caching, orchestration, streaming, eval pipelines, or multi-source ingestion [3] Ingestion Process - Raw data from sources like S3 is processed and chunked by a Lambda function [3] - Chunks are embedded using Bedrock Titan Embeddings [3] - Embeddings are stored in a vector database such as OpenSearch Serverless, DynamoDB, or Aurora, creating a searchable knowledge store [3] - Strategic reindexing with smart diffing, incremental updates, and metadata checks is important for efficiency and cost reduction [2] Querying Process - User questions are processed through API Gateway to a Lambda function [4] - Questions are embedded using Bedrock Titan Embeddings [4] - The embedding is matched against the vector database to retrieve relevant chunks [4] - Retrieved chunks are passed to a Bedrock LLM (Claude or OpenAI) to generate the final answer [4]

Bedrock LLM

Bedrock LLM