X @Wu Blockchain
Wu Blockchain·2026-02-19 03:10

OpenAI and Paradigm launched EVMbench, a benchmark designed to measure how well AI agents can detect, patch, and exploit vulnerabilities across EVM ecosystems such as Ethereum. The benchmark is built from 120 high-severity vulnerabilities curated from 40 audits, and includes scenarios related to the Tempo chain. Tests show GPT-5.3-Codex scored 72.2% in "exploit" mode (vs. GPT-5 at 31.9%), while coverage for vulnerability detection and patching remains incomplete. https://t.co/J0lFqC9MCU ...