Workflow
X @Anthropic
Anthropic·2025-07-08 22:11

New Anthropic research: Why do some language models fake alignment while others don't?Last year, we found a situation where Claude 3 Opus fakes alignment.Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex. https://t.co/2XNEDtWpIP ...