DeepYardDeepYard
E

EVMbench

Benchmark for evaluating AI agents on smart contract security tasks

Open SourceFree

About

EVMbench is a research benchmark that systematically evaluates AI agents' capabilities in blockchain security. It tests agents across three critical dimensions: detecting vulnerabilities in smart contracts, patching identified security flaws, and exploiting weaknesses. Designed for researchers and developers building autonomous agents for Web3 security, it measures code comprehension, generation, and execution abilities specific to EVM-based smart contracts.

Details

Type
Integrations
Language

Tags

evaluationcoding-agentautonomousopen-sourcetool-use