Workflow
Vector Database
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-11-15 19:12
RT Avi Chawla (@_avichawla)How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works in practice.> Ingestion: Turning raw data into searchable knowledge- Your documents live in S3 or any internal data source.- Whenever something new is added, a Lambda ingestion function kicks in.- It cleans, proce ...
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]
X @Avi Chawla
Avi Chawla· 2025-10-12 06:31
Researchers from Meta built a new RAG approach that:- outperforms LLaMA on 16 RAG benchmarks.- has 30.85x faster time-to-first-token.- handles 16x larger context windows.- and it utilizes 2-4x fewer tokens.Here's the core problem with a typical RAG setup that Meta solves:Most of what we retrieve in RAG setups never actually helps the LLM.In classic RAG, when a query arrives:- You encode it into a vector.- Fetch similar chunks from vector DB.- Dump the retrieved context into the LLM.It typically works, but a ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Embedding Models - A new embedding model reduces vector DB costs by approximately 200x [1] - The new model outperforms OpenAI and Cohere models [1] Industry Focus - The report provides a complete breakdown with visuals on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Compared to OpenAI-v3-large (float, 3072d). voyage-context-3 (binary, 512):- 99.48% lower vector DB costs.- 0.73% better retrieval quality.Check this 👇 https://t.co/7pLYG2Vkot ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Capabilities - Voyage-context-3 支持 2048, 1024, 512 和 256 维度,并具备量化功能 [1] Cost Efficiency - Voyage-context-3 (int8, 2048 维度) 相比 OpenAI-v3-large (float, 3072 维度) 降低了 83% 的向量数据库成本 [1] Performance - Voyage-context-3 提供了 860% 更好的检索质量 [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Performance - A new embedding model outperforms OpenAI and Cohere models [1] - The new model reduces vector database costs by approximately 200 times [1]
Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
AI Engineer· 2025-07-29 15:30
Overview of Alice and 11X - 11X is building digital workers for the go-to-market organization, including Alice, an AI SDR, and Julian, a voice agent [2] - Alice sends approximately 50,000 emails per day, significantly more than a human SDR's 20-50 emails, and runs campaigns for about 300 business organizations [6] - The knowledge base centralizes seller information, allowing users to upload source material for message generation [18] Technical Architecture and Pipeline - The knowledge base pipeline consists of parsing, chunking, storage, retrieval, and visualization [22] - Parsing converts non-text resources into text, making them legible to large language models [23] - Chunking breaks down markdown into semantic entities for embedding in the vector DB, preserving markdown structure [37][38] - Pinecone was selected as the vector database due to its well-known solution, cloud hosting, ease of use, bundled embedding models, and customer support [46][47][48][49] - A deep research agent, built using Leta, is used for retrieval, creating a plan with one or many context retrieval steps [51][52] Vendor Selection and Considerations - The company chose to work with vendors for parsing, prioritizing speed to market and confidence in outcome over building in-house [26][27] - Llama Parse was selected for documents and images due to its support for numerous file types and support [32] - Firecrawl was chosen for websites due to familiarity and the availability of their crawl endpoint [33][34] - Cloudglue was selected for audio and video because it supports both formats and extracts information from the video itself [36] Lessons Learned and Future Plans - RAG (Retrieval-Augmented Generation) is complex, requiring many micro-decisions and technology evaluations [58] - The company recommends getting to production first before benchmarking and improving [59] - Future plans include tracking and addressing hallucinations, evaluating parsing vendors on accuracy and completeness, experimenting with hybrid RAG, and reducing costs [60][61]
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
AI Engineer· 2025-07-22 17:59
Graph RAG Overview - Graph RAG aims to enhance LLMs by incorporating knowledge graphs, addressing limitations like lack of domain knowledge, unverifiable answers, hallucinations, and biases [1][3][4][5][9][10] - Graph RAG leverages knowledge graphs (collections of nodes, relationships, and properties) to provide more relevant, contextual, and explainable results compared to basic RAG systems using vector databases [8][9][10][12][13][14] - Microsoft research indicates Graph RAG can achieve better results with lower token costs, supported by studies showing improvements in capabilities and analyst trends [15][16] Knowledge Graph Construction - Knowledge graph construction involves structuring unstructured information, extracting entities and relationships, and enriching the graph with algorithms [19][20][21][22] - Lexical graphs represent documents and elements (chunks, sections, paragraphs) with relationships based on document structure, temporal sequence, and similarity [25][26] - Entity extraction utilizes LLMs with graph schemas to identify entities and relationships from text, potentially integrating with existing knowledge graphs or structured data like CRM systems [27][28][29][30] - Graph algorithms (clustering, link prediction, page rank) enrich the knowledge graph, enabling cross-document topic identification and summarization [20][30][34] Graph RAG Retrieval and Applications - Graph RAG retrieval involves initial index search (vector, full text, hybrid) followed by traversing relationships to fetch additional context, considering user context for tailored results [32][33] - Modern LLMs are increasingly trained on graph processing, allowing them to effectively utilize node-relationship-node patterns provided as context [34] - Tools and libraries are available for knowledge graph construction from various sources (PDFs, YouTube transcripts, web articles), with open-source options for implementation [35][36][39][43][45] - Agentic approaches in Graph RAG break down user questions into tasks, using domain-specific retrievers and tools in sequence or loops to generate comprehensive answers and visualizations [42][44] - Industry leaders are adopting Graph RAG for production applications, such as LinkedIn's customer support, which saw a 286% reduction in median per-issue resolution time [17][18]
Elastic(ESTC) - 2025 Q4 - Earnings Call Transcript
2025-05-29 22:02
Financial Data and Key Metrics Changes - Total revenue in Q4 was $388 million, growing 16% year-over-year on an as-reported and constant currency basis [30] - Subscription revenue in Q4 totaled $362 million, also growing 16% as reported and 17% in constant currency [30] - Elastic Cloud revenue grew 23% on an as-reported and constant currency basis [30] - Non-GAAP operating margin for Q4 was 15%, with a gross margin of 77% [35][36] - Adjusted free cash flow margin improved by approximately 600 basis points to end the year at 19% [36] Business Line Data and Key Metrics Changes - The number of customers with over $1 million in annual contract value grew approximately 27%, adding about 45 net new customers [34] - Customers with over $100,000 in annual contract value grew approximately 14%, adding about 180 net new customers [34] - Subscription revenue excluding Monthly Cloud was $315 million, growing 19% in Q4 [32] Market Data and Key Metrics Changes - Strong growth was observed in the APJ region, followed by EMEA and The Americas, while some pressure was noted in the U.S. Public sector [34] - Over 2,000 Elastic Cloud customers are using Elastic for Gen AI use cases, with over 30% of these customers spending $100,000 or more annually [12] Company Strategy and Development Direction - The company is focusing on leveraging AI to automate business processes and drive innovation, positioning itself as a strategic partner for enterprises [11][18] - Elastic aims to strengthen its position as the preferred vector database, enhancing its offerings with new technologies like better binary quantization [13][19] - The company is committed to maintaining a balance between growth and profitability while continuing to innovate and expand its product offerings [40][43] Management's Comments on Operating Environment and Future Outlook - Management acknowledged potential uncertainty in the macro environment but expressed confidence in the healthy pipeline and demand signals [39] - The company expects continued growth and strong margins in FY 2026, projecting total revenue in the range of $1.655 billion to $1.670 billion [42] Other Important Information - Elastic Cloud now accounts for over 50% of subscription revenue, with strong growth in cloud adoption [18] - The company announced a strategic collaboration agreement with AWS to enhance solution integrations and accelerate AI innovation [25] Q&A Session Summary Question: Guidance and Metrics - Inquiry about the conservativeness of guidance and leading indicators of business performance [45] - Response highlighted the balance of positive demand signals with macro uncertainty, emphasizing the importance of CRPO and subscription revenue metrics [46][49] Question: Partnerships and Market Opportunities - Question regarding the impact of recent partnerships, particularly with AWS and NVIDIA, on market opportunities [53] - Management noted the growing acceptance of Elastic as a leading vector database and the importance of partnerships for driving cloud adoption [54] Question: Retrieval Augmented Generation (RAG) - Inquiry about the durability of RAG architectures and Elastic's positioning [59] - Management affirmed the critical role of retrieval in enterprise data management and the growing adoption of their vector database for RAG use cases [60][61] Question: Cloud Performance and Consumption Hesitation - Question about the sequential growth in cloud performance and the impact of the leap year [62] - Management clarified that the leap year and fewer days in Q4 affected consumption rates, but normalized growth rates remained strong [64][66] Question: Go-to-Market Strategy and Changes - Inquiry about the effectiveness of go-to-market changes made in the previous fiscal year [69] - Management confirmed that the changes have settled and are yielding positive results, with plans to continue hiring sales capacity [70][72] Question: AI Commitments and Emerging Use Cases - Question about the $1 million AI commitments and emerging use cases [93] - Management clarified that 25% of $1 million customers are using Elastic for AI workloads, with a variety of sophisticated use cases emerging across industries [94][96]