Hallucinations

Search documents
X @Avi Chawla
Avi Chawla· 2025-08-20 06:30
DeepMind built a simple RAG technique that:- reduces hallucinations by 40%- improves answer relevancy by 50%Let's understand how to use it in RAG systems (with code): ...
What's with these OpenAI charts?
The Verge· 2025-08-08 14:07
Hallucination & Deception Rate Analysis - GTP5 scored 50 in deception rate, while 03 scored 47.4%, but the bar chart visually misrepresented the data, making GTP5 appear better when it's actually worse [1] - The report lacks clarity on what constitutes a good deception rate, ideally aiming for zero, especially when the model is used for medical advice [2][3] - Chart inconsistencies include boxes of different sizes representing percentages that don't align with their visual representation, such as 69.1%, 30.8%, and 52% [2] Data Presentation Issues - The charts exhibit a lack of consistency, with the highest numbers not always positioned in the highest spot [1] - Random coloring and inconsistent sizing of elements within the charts further contribute to the confusion [2]
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
AI Engineer· 2025-07-22 17:59
Graph RAG Overview - Graph RAG aims to enhance LLMs by incorporating knowledge graphs, addressing limitations like lack of domain knowledge, unverifiable answers, hallucinations, and biases [1][3][4][5][9][10] - Graph RAG leverages knowledge graphs (collections of nodes, relationships, and properties) to provide more relevant, contextual, and explainable results compared to basic RAG systems using vector databases [8][9][10][12][13][14] - Microsoft research indicates Graph RAG can achieve better results with lower token costs, supported by studies showing improvements in capabilities and analyst trends [15][16] Knowledge Graph Construction - Knowledge graph construction involves structuring unstructured information, extracting entities and relationships, and enriching the graph with algorithms [19][20][21][22] - Lexical graphs represent documents and elements (chunks, sections, paragraphs) with relationships based on document structure, temporal sequence, and similarity [25][26] - Entity extraction utilizes LLMs with graph schemas to identify entities and relationships from text, potentially integrating with existing knowledge graphs or structured data like CRM systems [27][28][29][30] - Graph algorithms (clustering, link prediction, page rank) enrich the knowledge graph, enabling cross-document topic identification and summarization [20][30][34] Graph RAG Retrieval and Applications - Graph RAG retrieval involves initial index search (vector, full text, hybrid) followed by traversing relationships to fetch additional context, considering user context for tailored results [32][33] - Modern LLMs are increasingly trained on graph processing, allowing them to effectively utilize node-relationship-node patterns provided as context [34] - Tools and libraries are available for knowledge graph construction from various sources (PDFs, YouTube transcripts, web articles), with open-source options for implementation [35][36][39][43][45] - Agentic approaches in Graph RAG break down user questions into tasks, using domain-specific retrievers and tools in sequence or loops to generate comprehensive answers and visualizations [42][44] - Industry leaders are adopting Graph RAG for production applications, such as LinkedIn's customer support, which saw a 286% reduction in median per-issue resolution time [17][18]
Taming Rogue AI Agents with Observability-Driven Evaluation — Jim Bennett, Galileo
AI Engineer· 2025-06-27 10:27
AI Agent Evaluation & Observability - The industry emphasizes the necessity of observability in AI development, particularly for evaluation-driven development [1] - AI trustworthiness is a significant concern, highlighting the need for robust evaluation methods [1] - Detecting problems in AI is challenging due to its non-deterministic nature, making traditional unit testing difficult [1] AI-Driven Evaluation - The industry suggests using AI to evaluate AI, leveraging its ability to understand and identify issues in AI systems [1] - LLMs can be used to score the performance of other LLMs, with the recommendation to use a better (potentially more expensive or custom-trained) LLM for evaluation than the one used in the primary application [2] - Galileo offers a custom-trained small language model (SLM) designed for effective AI evaluations [2] Implementation & Metrics - Evaluations should be integrated from the beginning of the AI application development process, including prompt engineering and model selection [2] - Granularity in evaluation is crucial, requiring analysis at each step of the AI workflow to identify failure points [2] - Key metrics for evaluation include action completion (did it complete the task) and action advancement (did it move towards the goal) [2] Continuous Improvement & Human Feedback - AI can provide insights and suggestions for improving AI agent performance based on evaluation data [3] - Human feedback is essential to validate and refine AI-generated metrics, ensuring accuracy and continuous learning [4] - Real-time prevention and alerting are necessary to address rogue AI agents and prevent issues in production [8]