Safety evaluations - filings, earnings calls, financial reports, news

Safety evaluations

Search documents

Anthropic· 2025-07-24 17:21

New Anthropic research: Building and evaluating alignment auditing agents.We developed three AI agents to autonomously complete alignment auditing tasks.In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors. https://t.co/HMQhMaA4v0 ...