Workflow
Vector Database
icon
Search documents
KIOXIA AiSAQ™ Technology Integrated into Milvus Vector Database
Businesswire· 2025-12-17 02:51
Core Insights - Kioxia Corporation has integrated its KIOXIA AiSAQ™ into the open-source vector database Milvus starting with version 2.6.4 [1] Company Summary - Kioxia Corporation is enhancing its product offerings by integrating KIOXIA AiSAQ™ into Milvus, which is a significant step in expanding its capabilities in the database sector [1]
X @Avi Chawla
Avi Chawla· 2025-11-15 19:12
RT Avi Chawla (@_avichawla)How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works in practice.> Ingestion: Turning raw data into searchable knowledge- Your documents live in S3 or any internal data source.- Whenever something new is added, a Lambda ingestion function kicks in.- It cleans, proce ...
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]
X @Avi Chawla
Avi Chawla· 2025-10-12 06:31
Core Innovation - REFRAG - Meta's REFRAG fundamentally rethinks retrieval in RAG setups by compressing and filtering context at a vector level [1] - REFRAG compresses each chunk into a single compressed embedding and uses a relevance policy trained via RL to select the most relevant chunks [1][2] - Only selected chunks are expanded back into full embeddings and passed to the LLM, processing only what matters [2] Technical Details - REFRAG encodes documents and stores them in a vector database [2] - It encodes the full user query, finds relevant chunks, and computes token-level embeddings for both [3] - A relevance policy, trained via RL, selects chunks to keep [3][5] - Token-level representations of the input query are concatenated with selected chunks and a compressed single-vector representation of rejected chunks before being sent to the LLM [3] Performance Metrics - REFRAG outperforms LLaMA on 16 RAG benchmarks [4][6] - It achieves 30.85x faster time-to-first-token, which is 3.75x better than previous state-of-the-art [4][6] - REFRAG handles 16x larger context windows [4][6] - It utilizes 2-4x fewer tokens [4][6] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [6]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Embedding Models - A new embedding model reduces vector DB costs by approximately 200x [1] - The new model outperforms OpenAI and Cohere models [1] Industry Focus - The report provides a complete breakdown with visuals on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Cost Efficiency - Voyage-context-3 (512维)相比 OpenAI-v3-large (3072维) 向量数据库成本降低 99.48% [1] Retrieval Quality - Voyage-context-3 (binary, 512维) 检索质量提升 0.73% [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Capabilities - Voyage-context-3 支持 2048, 1024, 512 和 256 维度,并具备量化功能 [1] Cost Efficiency - Voyage-context-3 (int8, 2048 维度) 相比 OpenAI-v3-large (float, 3072 维度) 降低了 83% 的向量数据库成本 [1] Performance - Voyage-context-3 提供了 860% 更好的检索质量 [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Performance - A new embedding model outperforms OpenAI and Cohere models [1] - The new model reduces vector database costs by approximately 200 times [1]
Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
AI Engineer· 2025-07-29 15:30
Overview of Alice and 11X - 11X is building digital workers for the go-to-market organization, including Alice, an AI SDR, and Julian, a voice agent [2] - Alice sends approximately 50,000 emails per day, significantly more than a human SDR's 20-50 emails, and runs campaigns for about 300 business organizations [6] - The knowledge base centralizes seller information, allowing users to upload source material for message generation [18] Technical Architecture and Pipeline - The knowledge base pipeline consists of parsing, chunking, storage, retrieval, and visualization [22] - Parsing converts non-text resources into text, making them legible to large language models [23] - Chunking breaks down markdown into semantic entities for embedding in the vector DB, preserving markdown structure [37][38] - Pinecone was selected as the vector database due to its well-known solution, cloud hosting, ease of use, bundled embedding models, and customer support [46][47][48][49] - A deep research agent, built using Leta, is used for retrieval, creating a plan with one or many context retrieval steps [51][52] Vendor Selection and Considerations - The company chose to work with vendors for parsing, prioritizing speed to market and confidence in outcome over building in-house [26][27] - Llama Parse was selected for documents and images due to its support for numerous file types and support [32] - Firecrawl was chosen for websites due to familiarity and the availability of their crawl endpoint [33][34] - Cloudglue was selected for audio and video because it supports both formats and extracts information from the video itself [36] Lessons Learned and Future Plans - RAG (Retrieval-Augmented Generation) is complex, requiring many micro-decisions and technology evaluations [58] - The company recommends getting to production first before benchmarking and improving [59] - Future plans include tracking and addressing hallucinations, evaluating parsing vendors on accuracy and completeness, experimenting with hybrid RAG, and reducing costs [60][61]
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
AI Engineer· 2025-07-22 17:59
Graph RAG Overview - Graph RAG aims to enhance LLMs by incorporating knowledge graphs, addressing limitations like lack of domain knowledge, unverifiable answers, hallucinations, and biases [1][3][4][5][9][10] - Graph RAG leverages knowledge graphs (collections of nodes, relationships, and properties) to provide more relevant, contextual, and explainable results compared to basic RAG systems using vector databases [8][9][10][12][13][14] - Microsoft research indicates Graph RAG can achieve better results with lower token costs, supported by studies showing improvements in capabilities and analyst trends [15][16] Knowledge Graph Construction - Knowledge graph construction involves structuring unstructured information, extracting entities and relationships, and enriching the graph with algorithms [19][20][21][22] - Lexical graphs represent documents and elements (chunks, sections, paragraphs) with relationships based on document structure, temporal sequence, and similarity [25][26] - Entity extraction utilizes LLMs with graph schemas to identify entities and relationships from text, potentially integrating with existing knowledge graphs or structured data like CRM systems [27][28][29][30] - Graph algorithms (clustering, link prediction, page rank) enrich the knowledge graph, enabling cross-document topic identification and summarization [20][30][34] Graph RAG Retrieval and Applications - Graph RAG retrieval involves initial index search (vector, full text, hybrid) followed by traversing relationships to fetch additional context, considering user context for tailored results [32][33] - Modern LLMs are increasingly trained on graph processing, allowing them to effectively utilize node-relationship-node patterns provided as context [34] - Tools and libraries are available for knowledge graph construction from various sources (PDFs, YouTube transcripts, web articles), with open-source options for implementation [35][36][39][43][45] - Agentic approaches in Graph RAG break down user questions into tasks, using domain-specific retrievers and tools in sequence or loops to generate comprehensive answers and visualizations [42][44] - Industry leaders are adopting Graph RAG for production applications, such as LinkedIn's customer support, which saw a 286% reduction in median per-issue resolution time [17][18]