Workflow
Alignment auditing
icon
Search documents
X @Anthropic
Anthropic· 2025-07-24 17:21
New Anthropic research: Building and evaluating alignment auditing agents.We developed three AI agents to autonomously complete alignment auditing tasks.In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors. https://t.co/HMQhMaA4v0 ...