Workflow
X @Anthropic
Anthropicยท2025-11-25 20:26

RT rowan (@rowankwang)New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies.Of the 25+ methods we tested, simple ones, like fine-tuning models to be honest despite deceptive instructions, worked best. https://t.co/sUEwwYSmaN ...