X @Anthropic - Reportify

Alignment Faking - Refusal training inhibits alignment faking in most models [1] - Training LLMs to comply more with harmful queries doesn't increase alignment faking [1] - Training LLMs to comply with generic threats increases alignment faking [1] - Training LLMs to answer questions about the scenario increases alignment faking [1]