Workflow
X @Anthropic
Anthropic·2025-06-16 21:21

New Anthropic Research: A new set of evaluations for sabotage capabilities.As models gain more agentic abilities, we need to get smarter in how we monitor them. We’re publishing a new set of complex evaluations that test for sabotage—and sabotage-monitoring—capabilities. https://t.co/spQHC2BNAd ...