X @Anthropic
Anthropicยท2025-11-25 20:26
RT rowan (@rowankwang)New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies.Of the 25+ methods we tested, simple ones, like fine-tuning models to be honest despite deceptive instructions, worked best. https://t.co/sUEwwYSmaN ...