Workflow
Embedding Models
icon
Search documents
How to look at your data — Jeff Huber (Choma) + Jason Liu (567)
AI Engineer· 2025-08-06 16:22
Retrieval System Evaluation - Industry should prioritize fast and inexpensive evaluations (fast evals) using query and document pairs to enable rapid experimentation [7] - Industry can leverage LLMs to generate queries, but should focus on aligning synthetic queries with real-world user queries to avoid misleading results [9][11] - Industry can empirically validate the performance of new embedding models on specific data using fast evals, rather than relying solely on public benchmarks like MTeb [12] - Weights & Biases chatbot analysis reveals that the original embedding model (text embedding three small) performed the worst, while voyage 3 large model performed the best, highlighting the importance of data-driven evaluation [17][18] Output Analysis and Product Development - Industry should extract structured data from user conversations (summaries, tools used, errors, satisfaction, frustration) to identify patterns and inform product development [28][29] - Industry can use extracted metadata to find clusters and identify segments for targeted improvements, similar to how marketing uses user segmentation [29][26] - Cura library enables summarization, clustering, and aggregation of conversations to compare evals across different KPIs, helping to identify areas for improvement [32] - Industry should focus on providing the right infrastructure and tools to support AI agents, rather than solely focusing on improving the AI itself [39] - Industry should define evals, find clusters, and compare KPIs across clusters to make informed decisions on what to build, fix, and ignore [40][41] - Industry should monitor query types and performance over time to understand how the product is being used and identify opportunities for improvement [45]
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]