Workflow
Hallucinations
icon
Search documents
What's with these OpenAI charts?
The Verge· 2025-08-08 14:07
There's a chart about hallucinations. GTP5 scores a 50. They don't explain what this score means except deception rate.And 03 scored a 47.4%, but the bar for 03 is like more than double the height. It basically makes it look like GTP 5 is materially better at not hallucinating on codegen when in reality it's worse. And this is literally a bar chart called deception across models.These are bewildering. There's no consistency. The highest number is not in the highest spot.I don't understand what's happening. ...
Practical GraphRAG: Making LLMs smarter with Knowledge Graphs — Michael, Jesus, and Stephen, Neo4j
AI Engineer· 2025-07-22 17:59
[Music] We are talking about graph rack today. That's the graph rack trick of course. Uh and we want to look at patterns for successful graph applications uh for um making LLMs a little bit smarter by putting knowledge graph into the picture.My name is Michael Hunga. I'm VP at of product innovation at Neo Forj. My name is Steven Shin.I lead the developer relations at Neo Forj. And um actually we're we're both co-authoring. This is fun because we're both already authors and finally we've been friends for yea ...
Taming Rogue AI Agents with Observability-Driven Evaluation — Jim Bennett, Galileo
AI Engineer· 2025-06-27 10:27
[Music] So I'm here to talk about taming rogue AI agents but essentially want to talk about uh evaluation driven development observability driven but really why we need observability. So, who uses AI? Is that Jim's stupid most stupid question of the day? Probably. Who trusts AI? Right. If you'd like to meet me after, I've got some snake oil you might be interested in buying. Yeah, we do not trust AI in the slightest. Now, different question. Who reads books? That's reading books. If you want some recommenda ...