X @Anthropic - Reportify

New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.Read more: https://t.co/oVCNyaiK5w ...