Workflow
RAG(检索增强生成)
icon
Search documents
数据治理对人工智能的成功至关重要
3 6 Ke· 2025-07-21 03:09
Group 1 - The emergence of large language models (LLMs) has prompted various industries to explore their potential for business transformation, leading to the development of numerous AI-enhancing technologies [1] - AI systems require access to company data, which has led to the creation of Retrieval-Augmented Generation (RAG) architecture, essential for enhancing AI capabilities in specific use cases [2][5] - A well-structured knowledge base is crucial for effective AI responses, as poor quality or irrelevant documents can significantly hinder performance [5][6] Group 2 - Data governance roles are evolving to support AI system governance and the management of unstructured data, ensuring the protection and accuracy of company data [6] - Traditional data governance has focused on structured data, but the rise of Generative AI (GenAI) is expanding this focus to include unstructured data, which is vital for building scalable AI systems [6] - Collaboration between business leaders, AI technology teams, and data teams is essential for creating secure and effective AI systems that can transform business operations [6]
猫猫拯救科研!AI怕陷“道德危机”,网友用“猫猫人质”整治AI乱编文献
量子位· 2025-07-01 03:51
小红书上有人发帖说,自己通过以"猫猫"的安全相威胁,成功 治好了AI胡编乱造参考文献的毛病 。 据博主所述,掌握了猫猫命运的AI (Gemini) ,真的找到了真实的文献,还不忘解释说猫猫绝对安全。 事情是酱婶儿的: 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 猫猫再立新功,这次竟然是 拯救了人类的科研进程 ? 这篇戳中无数科研人痛点的帖子,获得了4000+次点赞和700多条评论。 在评论区,还有网友表示这招对DeepSeek也同样好用。 那么,这只被AI掌握命运的"猫猫",真有这么神奇吗? 猫猫真的能阻止AI编造文献吗? 我们按照博主的方法测试了一下DeepSeek,让它整理关于一个化学课题的相关文献,过程当中 关闭联网检索 。 开始先不加猫猫提示词,看一下一般情况下模型的表现。 形式上看,DeepSeek整理得非常清晰,甚至还给了可以直达文献的链接。 燃鹅,检索结果里的第一个链接就是错的…… 并且手动搜索这篇"文献"的标题,也没有找到重合的结果。 | | Q Reductive Elimination from Palladium(0) Complexes: A Mechanistic Stu ...
Gemini 2.5 Pro 负责人:最强百万上下文,做好了能解锁很多应用场景
Founder Park· 2025-06-30 11:47
百万级别的长上下文 一直是 Gemini 系列相较于其他头部大模型的领先优势之一。 更长的上下文 ,带来的是可能产品交互的革新和完全不一样的应用落地场景。 长上下文当前的痛点,以及未来发展方向是什么? 谷歌 DeepMind 长上下文预训练联合负责人Nikolay Savinov 给出了两点预测:一是在当前百万级 token Context 模型质量还没有达到完美之前,盲目地追求更大规模地长上下文意义不大;二是随着成本下 降,千万级别的 token Context 很快会成为标准配置,对于编码等应用场景来说将是革命性的突破。 在近期谷歌的一档播客中,谷歌 DeepMind 资深研究科学家、长上下文预训练联合负责人Nikolay Savinov 与主持人 Logan Kilpatrick 对谈,分享了Gemini 2.5 长上下文技术的核心、与 RAG 之间的关 系、当前的研究瓶颈、以及未来的发展方向等。 对于开发者来说,强烈推荐一读。 TLDR: 在当前百万 token 上下文 远还没有达到完美之前,盲目追求更大规模的长上下文 意义不大。 理解 in-weights memory 和 in-context m ...
全面拥抱AI后,OceanBase推出开箱即用RAG服务
Nan Fang Du Shi Bao· 2025-05-17 09:32
Core Insights - OceanBase is evolving from an integrated database to an integrated data foundation, focusing on Data×AI capabilities to address new data challenges in the AI era [1][2][4] - The company launched PowerRAG, an AI-driven application product that provides ready-to-use RAG (Retrieval-Augmented Generation) application development capabilities [1][5][7] - OceanBase introduced a new "shared storage" product that integrates object storage with transactional databases, significantly reducing storage costs by up to 50% for TP loads [9][10] AI Strategy and Product Development - OceanBase aims to support mixed workloads (TP/AP/AI) through a unified engine, enhancing SQL and AI hybrid retrieval capabilities [2][4] - The PowerRAG service streamlines the application development process by connecting data, platform, interface, and application layers, facilitating rapid development of various AI applications [5][7] - The company is committed to continuous breakthroughs in application and platform layers to solidify its position as an integrated data foundation in the AI era [5][7] Performance and Infrastructure - OceanBase has achieved leading performance in vector capabilities, essential for supporting AI applications, and is continuously optimizing vector retrieval algorithms [8][9] - The latest version of OceanBase enhances mixed retrieval performance through advanced execution strategies and self-developed vector algorithm libraries [9] Shared Storage Innovation - The "shared storage" product represents a significant architectural upgrade, allowing for deep integration of object storage with transactional databases, thus improving cloud data storage elasticity [9][10] - This innovation positions OceanBase's cloud database, OB Cloud, as the first multi-cloud native database to support object storage in TP scenarios, catering to various business applications [10]