Workflow
Vector Database
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-10-12 19:29
RT Avi Chawla (@_avichawla)Researchers from Meta built a new RAG approach that:- outperforms LLaMA on 16 RAG benchmarks.- has 30.85x faster time-to-first-token.- handles 16x larger context windows.- and it utilizes 2-4x fewer tokens.Here's the core problem with a typical RAG setup that Meta solves:Most of what we retrieve in RAG setups never actually helps the LLM.In classic RAG, when a query arrives:- You encode it into a vector.- Fetch similar chunks from vector DB.- Dump the retrieved context into the LL ...
X @Avi Chawla
Avi Chawla· 2025-10-12 06:31
Researchers from Meta built a new RAG approach that:- outperforms LLaMA on 16 RAG benchmarks.- has 30.85x faster time-to-first-token.- handles 16x larger context windows.- and it utilizes 2-4x fewer tokens.Here's the core problem with a typical RAG setup that Meta solves:Most of what we retrieve in RAG setups never actually helps the LLM.In classic RAG, when a query arrives:- You encode it into a vector.- Fetch similar chunks from vector DB.- Dump the retrieved context into the LLM.It typically works, but a ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Embedding Models - A new embedding model reduces vector DB costs by approximately 200x [1] - The new model outperforms OpenAI and Cohere models [1] Industry Focus - The report provides a complete breakdown with visuals on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:34
Compared to OpenAI-v3-large (float, 3072d). voyage-context-3 (binary, 512):- 99.48% lower vector DB costs.- 0.73% better retrieval quality.Check this 👇 https://t.co/7pLYG2Vkot ...
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Capabilities - Voyage-context-3 支持 2048, 1024, 512 和 256 维度,并具备量化功能 [1] Cost Efficiency - Voyage-context-3 (int8, 2048 维度) 相比 OpenAI-v3-large (float, 3072 维度) 降低了 83% 的向量数据库成本 [1] Performance - Voyage-context-3 提供了 860% 更好的检索质量 [1]
X @Avi Chawla
Avi Chawla· 2025-08-14 06:33
Model Performance - A new embedding model outperforms OpenAI and Cohere models [1] - The new model reduces vector database costs by approximately 200 times [1]
Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
AI Engineer· 2025-07-29 15:30
Overview of Alice and 11X - 11X is building digital workers for the go-to-market organization, including Alice, an AI SDR, and Julian, a voice agent [2] - Alice sends approximately 50,000 emails per day, significantly more than a human SDR's 20-50 emails, and runs campaigns for about 300 business organizations [6] - The knowledge base centralizes seller information, allowing users to upload source material for message generation [18] Technical Architecture and Pipeline - The knowledge base pipeline consists of parsing, chunking, storage, retrieval, and visualization [22] - Parsing converts non-text resources into text, making them legible to large language models [23] - Chunking breaks down markdown into semantic entities for embedding in the vector DB, preserving markdown structure [37][38] - Pinecone was selected as the vector database due to its well-known solution, cloud hosting, ease of use, bundled embedding models, and customer support [46][47][48][49] - A deep research agent, built using Leta, is used for retrieval, creating a plan with one or many context retrieval steps [51][52] Vendor Selection and Considerations - The company chose to work with vendors for parsing, prioritizing speed to market and confidence in outcome over building in-house [26][27] - Llama Parse was selected for documents and images due to its support for numerous file types and support [32] - Firecrawl was chosen for websites due to familiarity and the availability of their crawl endpoint [33][34] - Cloudglue was selected for audio and video because it supports both formats and extracts information from the video itself [36] Lessons Learned and Future Plans - RAG (Retrieval-Augmented Generation) is complex, requiring many micro-decisions and technology evaluations [58] - The company recommends getting to production first before benchmarking and improving [59] - Future plans include tracking and addressing hallucinations, evaluating parsing vendors on accuracy and completeness, experimenting with hybrid RAG, and reducing costs [60][61]
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
AI Engineer· 2025-07-22 17:59
Graph RAG Overview - Graph RAG aims to enhance LLMs by incorporating knowledge graphs, addressing limitations like lack of domain knowledge, unverifiable answers, hallucinations, and biases [1][3][4][5][9][10] - Graph RAG leverages knowledge graphs (collections of nodes, relationships, and properties) to provide more relevant, contextual, and explainable results compared to basic RAG systems using vector databases [8][9][10][12][13][14] - Microsoft research indicates Graph RAG can achieve better results with lower token costs, supported by studies showing improvements in capabilities and analyst trends [15][16] Knowledge Graph Construction - Knowledge graph construction involves structuring unstructured information, extracting entities and relationships, and enriching the graph with algorithms [19][20][21][22] - Lexical graphs represent documents and elements (chunks, sections, paragraphs) with relationships based on document structure, temporal sequence, and similarity [25][26] - Entity extraction utilizes LLMs with graph schemas to identify entities and relationships from text, potentially integrating with existing knowledge graphs or structured data like CRM systems [27][28][29][30] - Graph algorithms (clustering, link prediction, page rank) enrich the knowledge graph, enabling cross-document topic identification and summarization [20][30][34] Graph RAG Retrieval and Applications - Graph RAG retrieval involves initial index search (vector, full text, hybrid) followed by traversing relationships to fetch additional context, considering user context for tailored results [32][33] - Modern LLMs are increasingly trained on graph processing, allowing them to effectively utilize node-relationship-node patterns provided as context [34] - Tools and libraries are available for knowledge graph construction from various sources (PDFs, YouTube transcripts, web articles), with open-source options for implementation [35][36][39][43][45] - Agentic approaches in Graph RAG break down user questions into tasks, using domain-specific retrievers and tools in sequence or loops to generate comprehensive answers and visualizations [42][44] - Industry leaders are adopting Graph RAG for production applications, such as LinkedIn's customer support, which saw a 286% reduction in median per-issue resolution time [17][18]
Elastic(ESTC) - 2025 Q4 - Earnings Call Transcript
2025-05-29 22:02
Financial Data and Key Metrics Changes - Total revenue in Q4 was $388 million, growing 16% year-over-year on an as-reported and constant currency basis [30] - Subscription revenue in Q4 totaled $362 million, also growing 16% as reported and 17% in constant currency [30] - Elastic Cloud revenue grew 23% on an as-reported and constant currency basis [30] - Non-GAAP operating margin for Q4 was 15%, with a gross margin of 77% [35][36] - Adjusted free cash flow margin improved by approximately 600 basis points to end the year at 19% [36] Business Line Data and Key Metrics Changes - The number of customers with over $1 million in annual contract value grew approximately 27%, adding about 45 net new customers [34] - Customers with over $100,000 in annual contract value grew approximately 14%, adding about 180 net new customers [34] - Subscription revenue excluding Monthly Cloud was $315 million, growing 19% in Q4 [32] Market Data and Key Metrics Changes - Strong growth was observed in the APJ region, followed by EMEA and The Americas, while some pressure was noted in the U.S. Public sector [34] - Over 2,000 Elastic Cloud customers are using Elastic for Gen AI use cases, with over 30% of these customers spending $100,000 or more annually [12] Company Strategy and Development Direction - The company is focusing on leveraging AI to automate business processes and drive innovation, positioning itself as a strategic partner for enterprises [11][18] - Elastic aims to strengthen its position as the preferred vector database, enhancing its offerings with new technologies like better binary quantization [13][19] - The company is committed to maintaining a balance between growth and profitability while continuing to innovate and expand its product offerings [40][43] Management's Comments on Operating Environment and Future Outlook - Management acknowledged potential uncertainty in the macro environment but expressed confidence in the healthy pipeline and demand signals [39] - The company expects continued growth and strong margins in FY 2026, projecting total revenue in the range of $1.655 billion to $1.670 billion [42] Other Important Information - Elastic Cloud now accounts for over 50% of subscription revenue, with strong growth in cloud adoption [18] - The company announced a strategic collaboration agreement with AWS to enhance solution integrations and accelerate AI innovation [25] Q&A Session Summary Question: Guidance and Metrics - Inquiry about the conservativeness of guidance and leading indicators of business performance [45] - Response highlighted the balance of positive demand signals with macro uncertainty, emphasizing the importance of CRPO and subscription revenue metrics [46][49] Question: Partnerships and Market Opportunities - Question regarding the impact of recent partnerships, particularly with AWS and NVIDIA, on market opportunities [53] - Management noted the growing acceptance of Elastic as a leading vector database and the importance of partnerships for driving cloud adoption [54] Question: Retrieval Augmented Generation (RAG) - Inquiry about the durability of RAG architectures and Elastic's positioning [59] - Management affirmed the critical role of retrieval in enterprise data management and the growing adoption of their vector database for RAG use cases [60][61] Question: Cloud Performance and Consumption Hesitation - Question about the sequential growth in cloud performance and the impact of the leap year [62] - Management clarified that the leap year and fewer days in Q4 affected consumption rates, but normalized growth rates remained strong [64][66] Question: Go-to-Market Strategy and Changes - Inquiry about the effectiveness of go-to-market changes made in the previous fiscal year [69] - Management confirmed that the changes have settled and are yielding positive results, with plans to continue hiring sales capacity [70][72] Question: AI Commitments and Emerging Use Cases - Question about the $1 million AI commitments and emerging use cases [93] - Management clarified that 25% of $1 million customers are using Elastic for AI workloads, with a variety of sophisticated use cases emerging across industries [94][96]
存储供应商,陷入困境
半导体行业观察· 2025-05-28 01:36
Core Viewpoint - The primary challenge for storage vendors is how to store data for artificial intelligence (AI) access, ensuring that AI models and agents can quickly retrieve this data through efficient data pipelines [1][3]. Group 1: AI Integration in Storage - AI is being utilized in storage management to enhance efficiency and is crucial for cybersecurity [1]. - Storage hardware and software vendors are adopting Nvidia GPUDirect support to expedite raw data transmission to GPUs, which has expanded from file support to include object storage via RDMA [3][4]. - Data management software can transition from storage array controllers to databases or data lakes, and can be hosted in public clouds like AWS, Azure, or GCP [3][4]. Group 2: Data Processing and Storage Solutions - Data must be identified, located, selected, and vectorized before being usable by large language models (LLMs), with vector storage options including specialized vector databases [4][5]. - Vendors like VAST Data are developing their own AI pipelines, contrasting with companies like Qumulo that focus on internal operations enhancement without GPUDirect support [5][10]. - Major storage vendors such as Cloudian, Dell, and IBM support GPUDirect for file and object storage, although support may vary across product lines [8][9]. Group 3: Advanced AI Capabilities - Nvidia's BasePOD and SuperPOD GPU server systems have been certified by several vendors, indicating a trend towards deeper integration with Nvidia's AI software [9][10]. - Companies like Hammerspace and VAST Data support Nvidia GPU server's key-value (KV) cache offloading, which is essential for optimizing AI model performance [11]. - Cloud file service providers are also exploring AI data pipelines to support GPU-based inference, although collaboration with Nvidia remains limited [12]. Group 4: Challenges in Data Accessibility - Backup and archive data pose challenges for AI model access, as many backup vendors are reluctant to provide API access to their stored data [13][14]. - Organizations with diverse storage vendors and systems may face difficulties in creating a unified strategy for AI model data accessibility, potentially leading to vendor consolidation [14].