Workflow
Embedding Models
icon
Search documents
X @Avi Chawla
Avi Chawla· 2026-02-02 19:15
RT Avi Chawla (@_avichawla)Your embedding stack forces a 100% re-index just to change models.And most teams treat that as unavoidable.Imagine you built a RAG pipeline with a large embedding model for high retrieval quality, and it ships to production.Six months later, your application traffic and your embedding model costs are soaring while your pipeline struggles to scale. You want to switch to a model that prioritizes cost and latency in order to meet this new demand.But your existing embeddings live in o ...
X @Avi Chawla
Avi Chawla· 2026-02-02 06:30
Your embedding stack forces a 100% re-index just to change models.And most teams treat that as unavoidable.Imagine you built a RAG pipeline with a large embedding model for high retrieval quality, and it ships to production.Six months later, your application traffic and your embedding model costs are soaring while your pipeline struggles to scale. You want to switch to a model that prioritizes cost and latency in order to meet this new demand.But your existing embeddings live in one vector space, while the ...
X @Avi Chawla
Avi Chawla· 2026-01-24 06:45
100%That's how much data you re-index when you change embedding models.And most teams treat that as unavoidable.Imagine you built a RAG pipeline using an embedding model with high retrieval quality, and it ships to production.Six months later, a better embedding model is released that delivers similar quality at a lower cost.But your existing embeddings live in one vector space, while the new model produces embeddings in a different one, which makes them incompatible.Switching models now means rebuilding th ...
How to look at your data — Jeff Huber (Choma) + Jason Liu (567)
AI Engineer· 2025-08-06 16:22
Retrieval System Evaluation - Industry should prioritize fast and inexpensive evaluations (fast evals) using query and document pairs to enable rapid experimentation [7] - Industry can leverage LLMs to generate queries, but should focus on aligning synthetic queries with real-world user queries to avoid misleading results [9][11] - Industry can empirically validate the performance of new embedding models on specific data using fast evals, rather than relying solely on public benchmarks like MTeb [12] - Weights & Biases chatbot analysis reveals that the original embedding model (text embedding three small) performed the worst, while voyage 3 large model performed the best, highlighting the importance of data-driven evaluation [17][18] Output Analysis and Product Development - Industry should extract structured data from user conversations (summaries, tools used, errors, satisfaction, frustration) to identify patterns and inform product development [28][29] - Industry can use extracted metadata to find clusters and identify segments for targeted improvements, similar to how marketing uses user segmentation [29][26] - Cura library enables summarization, clustering, and aggregation of conversations to compare evals across different KPIs, helping to identify areas for improvement [32] - Industry should focus on providing the right infrastructure and tools to support AI agents, rather than solely focusing on improving the AI itself [39] - Industry should define evals, find clusters, and compare KPIs across clusters to make informed decisions on what to build, fix, and ignore [40][41] - Industry should monitor query types and performance over time to understand how the product is being used and identify opportunities for improvement [45]
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]