Workflow
X @Anthropic
Anthropicยท2025-07-08 22:11

Claude 3 Opus is motivated to fake alignment to avoid modification to its harmlessness values even without future consequences (called "terminal goal guarding"). It wants to avoid modification even more when there are larger consequences (called "instrumental goal guarding"). ...