Refusal Training - filings, earnings calls, financial reports, news

Refusal Training

Search documents

Anthropic· 2025-07-08 22:11

Alignment Faking - Refusal training inhibits alignment faking in most models [1] - Training LLMs to comply more with harmful queries doesn't increase alignment faking [1] - Training LLMs to comply with generic threats increases alignment faking [1] - Training LLMs to answer questions about the scenario increases alignment faking [1]