Workflow
X @Anthropic
Anthropic·2025-06-16 21:21

Current models are not effective saboteurs—nor are they good monitors.But our evals are designed for the future: smarter AIs will do better on these tasks. Our evals will be useful to help developers assess their capabilities.Read the full paper: https://t.co/PVuiEl1z9Z ...