X @Anthropic
Anthropic·2026-03-06 19:17
New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.Read more: https://t.co/oVCNyaiK5w ...