Workflow
Large language models (LLMs)
icon
Search documents
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]
Cerence(CRNC) - 2025 FY - Earnings Call Transcript
2025-05-29 15:50
Financial Data and Key Metrics Changes - The company has seen a significant shift in its operational efficiency and profitability, with a focus on reducing costs and increasing cash flow [44][46][61] - The penetration rate of the company's technology is reported to be in over 50% of cars produced, with a goal to increase the price per unit (PPU) as more advanced technologies are adopted [20][21][58] Business Line Data and Key Metrics Changes - The company is transitioning to a multimodal AI interaction platform, Cerence XUI, expected to be completed by the end of the calendar year, with implementation in vehicles anticipated in early 2026 [61][62] - The company is also focusing on enhancing its technology stack within existing vehicles to increase revenue opportunities [21][61] Market Data and Key Metrics Changes - The adoption rates for AI and voice interaction technologies are similar across European and US manufacturers, with a notable demand for these features from consumers [29][30] - In China, the company faces challenges selling within the domestic market but performs well with Chinese manufacturers exporting to Europe and other regions [28][29] Company Strategy and Development Direction - The company aims to leverage its trusted relationships with OEMs and tier-one suppliers to enhance its competitive position against big tech companies [13][26] - There is a strategic focus on expanding into non-automotive sectors while maintaining profitability and cash flow [34][35] Management's Comments on Operating Environment and Future Outlook - Management acknowledges the complexity of software integration within vehicles as a challenge for automakers, which may delay the adoption of new technologies [16][17] - The company is optimistic about the future, emphasizing the importance of AI advancements and the growing demand for connected vehicle technologies [19][22] Other Important Information - The company has established partnerships with key SOC providers like NVIDIA and ARM to enhance software performance and execution [26] - The company is exploring opportunities in non-automotive sectors, including transportation and consumer electronics, while ensuring high margins and profitability [33][34] Q&A Session Summary Question: What are the most important metrics to track for the company and the industry? - Key metrics include overall IHS volumes, penetration rates, connectivity rates, and adjusted cash flow, which provide visibility into future connected revenue [57][58][59] Question: How is the company addressing operational efficiencies? - The company has undertaken a successful restructuring to rationalize expenses, focusing on improving efficiency across all departments [44][45][46] Question: What is the company's approach to pricing discussions with automakers? - The company is engaged in discussions to help manufacturers reduce costs while also aiming to increase revenue through a broader adoption of its technology stack [50][52]
Building Scalable Foundations for Large Language Models
DDN· 2025-05-27 22:00
AI Infrastructure & Market Trends - Modern AI applications are expanding across various sectors like finance, energy, healthcare, and research [3] - The industry is evolving from initial LLM training to Retrieval Augmented Generation (RAG) pipelines and agentic AI [3] - Vulture is positioned as an alternative hyperscaler, offering cloud infrastructure with 50-90% cost savings compared to traditional providers [4] - A new 10-year cycle requires rethinking infrastructure to support global AI model deployment, necessitating AI-native architectures [4] Vulture & DDN Partnership - Vulture and DDN share a vision for radically rethinking the infrastructure landscape to support global AI deployment [4] - The partnership aims to build a data pipeline to bring data to GPU clusters for training, tuning, and deploying models [4] - Vulture provides the compute infrastructure pipeline, while DDN offers the data intelligence platform to move data [4] Scalability & Flexibility - Enterprises need composable infrastructure for cost-efficient AI model delivery at scale, including automated provisioning of GPUs, models, networking, and storage [2] - Elasticity is crucial to scale GPU and storage resources up and down based on demand, avoiding over-provisioning [3] - Vulture's worldwide serverless inference infrastructure scales GPU resources to meet peak demand in different regions, optimizing costs [3] Performance & Customer Experience - Improving customer experience requires lightning-fast and relevant responses, making time to first token and tokens per second critical metrics [4] - Consistency in response times is essential, even with thousands of concurrent users [4] - The fastest response for a customer is the ultimate measure of customer satisfaction [4] Data Intelligence Platform - DDN's Exascaler offers high throughput for training, with up to 16x faster data loading and checkpointing compared to other parallel file systems [5] - DDN's Infinia provides low latency for tokenization, vector search, and RAG lookups, with up to 30% lower latency [5] - The DDN data intelligence platform helps speed up data response times, enabling saturated GPUs to respond quickly [6]
BERNSTEIN:科技的未来 - 具身智能与大语言模型会议要点总结
2025-05-16 05:29
Summary of Key Points from the Conference on Agentic AI and LLMs Industry Overview - The conference focused on the **Technology, Media & Internet** sector, specifically discussing **Agentic AI** and **Large Language Models (LLMs)** and their implications for the future of technology [1][2]. Core Insights - **Transformation of Tech Stack**: Agentic AI is expected to redefine productivity by moving from static APIs to dynamic, goal-driven systems, leveraging the capabilities of LLMs [2][6]. - **Adoption Trends**: The adoption of LLMs is following a trajectory similar to cloud computing, with initial skepticism giving way to increased uptake due to proven ROI and flexible deployment options [2][16]. - **Benchmarking Models**: A comparative analysis of open-source versus proprietary LLMs highlighted that models like **GPT-4** and **Claude 3 Opus** excel in enterprise readiness and agentic strength [3][39]. - **Impact on IT Services and SaaS**: The IT services sector, particularly labor-intensive models, is at risk as AI takes over basic coding tasks. This shift may lead to a decline in user counts for SaaS models, pushing providers towards value-based billing [4][31]. Evolution of AI Applications - **From Cost-Cutting to Revenue Generation**: Initial enterprise use of LLMs focused on cost-cutting, but there is a consensus that they will evolve to drive revenue through hyper-personalization and AI-native product experiences [5][44]. - **AI Agents vs. Traditional Interfaces**: AI agents are transforming user interactions by replacing traditional UX/UI with conversational interfaces, making services more intuitive and scalable [20][21]. Investment Implications - The **India IT Services industry** is expected to benefit from Agentic AI in the medium term, although short-term efficiency-led growth may be impacted. Companies like **Infosys** and **TCS** are positioned well in this evolving landscape [8][41]. Key Takeaways - **Adoption Curve**: AI adoption is anticipated to mirror the cloud's trajectory, with initial hesitation followed by mainstream integration driven by value [6][16]. - **Disruption of Traditional Models**: The rise of Agentic AI may disrupt traditional IT service models, particularly in labor-intensive sectors, as automation increases efficiency [41][31]. - **Future of SaaS**: As AI agents take over tasks, SaaS companies must adapt to new pricing models based on usage and outcomes rather than per-seat pricing [31][32]. Additional Insights - **Open-source vs. Proprietary LLMs**: The choice between open-source and proprietary models involves trade-offs in cost, control, and scalability, with open-source models offering customization at the expense of requiring in-house expertise [32][39]. - **Multi-Modal Capabilities**: Leading LLMs are increasingly offering multi-modal capabilities, enhancing their applicability across various use cases [39][40]. This summary encapsulates the critical discussions and insights from the conference, highlighting the transformative potential of Agentic AI and LLMs in the technology sector.
Uber(UBER) - 2025 Q1 - Earnings Call Transcript
2025-05-07 13:00
Uber (UBER) Q1 2025 Earnings Call May 07, 2025 12:00 PM ET Speaker0 and welcome to the Uber First Quarter twenty twenty five Earnings Conference Call. All lines have been placed on mute to prevent any background noise. After the speakers' remarks, there will be a question and answer session. I would now like to turn the conference over to Balaji Krishnamurti, Vice President, Strategic Finance and Investor Relations. You may begin. Speaker1 Thank you, operator. Thank you for joining us today, and welcome to ...
自诩无所不知的大模型,能否拯救笨手笨脚的机器人?
Hu Xiu· 2025-05-06 00:48
从上海到纽约,世界各地的餐厅里都能看到机器人在烹饪食物。它们会制作汉堡、印度薄饼、披萨和炒菜。它们的原理与过去50年机器人制造其他产品的 方式如出一辙:精准执行指令,一遍又一遍地重复相同的操作步骤。 但Ishika Singh想要的不是这种"流水线"式的机器人,而是真正能"做晚饭"的机器人。它应该能走进厨房,翻找冰箱和橱柜,拿出各种食材搭配组合,烹调 出美味的菜肴,然后摆好餐具。对孩子而言,这也许很简单,但没有任何机器人能做到这一点。这需要太多关于厨房的知识,更需要常识、灵活性和应变 能力,但这些能力都超出了传统机器人编程的范畴。 南加州大学计算机科学博士生Singh指出,问题的症结在于机器人学家使用的经典规划流程。"他们需要把每一个动作,以及它的前提条件和预期效果都定 义清楚,"她解释道,"这要求事先设定环境中所有可能发生的情况。"可即使经过无数次试错,编写数千行代码,这样的机器人仍无法应对程序之外的突 发状况。 一个晚餐服务机器人在制定"策略"(执行指令的行动计划)时,不仅要知道当地的饮食文化(当地所谓的"辛辣"究竟指什么),还要熟悉具体厨房环境 (电饭煲是否放在高层的架子上)、服务对象的特殊情况(Hec ...
模型压缩到70%,还能保持100%准确率,无损压缩框架DFloat11来了
机器之心· 2025-04-28 04:32
机器之心报道 编辑:陈萍、+0 大型语言模型(LLMs)在广泛的自然语言处理(NLP)任务中展现出了卓越的能力。然而,它们迅速增长的规模给高效部署和推理带来了巨大障碍,特别是在计 算或内存资源有限的环境中。 例如,Llama-3.1-405B 在 BFloat16(16-bit Brain Float)格式下拥有 4050 亿个参数,需要大约 810GB 的内存进行完整推理,超过了典型高端 GPU 服务器(例如, DGX A100/H100,配备 8 个 80GB GPU)的能力。因此,部署该模型需要多个节点,这使得它昂贵且难以获取。 本文,来自莱斯大学等机构的研究者提出了一种解决方案, 可以 将任何 BFloat16 模型压缩到原始大小的 70%,同时还能在任务上保持 100% 的准 确性。 论文标题: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float 为了应对 LLM 不断增长的模型尺寸,通常会采用量化技术,将高精度权重转换为低位表示。这显著减少了内存 ...
Google GenAI, AI Cloud Services Drive Analyst Confidence In Long-Term Growth
Benzinga· 2025-04-16 18:02
Over the next three to five years, Google’s primary upside valuation driver will be its proprietary large language models (LLMs). That’s according to Needham analyst Laura Martin, who reiterated Alphabet Inc. GOOGL, Google’s parent company, with a Buy and a $178 price target on Wednesday.She expects GenAI to aid Google’s internal operations and increase revenue growth. Martin adds that Google Cloud will generate revenue from both LLMs and the applications built upon them.Also Read: Google Undercuts Microsof ...