Workflow
X @Anthropic
Anthropic·2025-06-20 19:30

New Anthropic Research: Agentic Misalignment.In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down. https://t.co/KbO4UJBBDU ...