DeepYardDeepYard
C

CapCode

Cheat-proof coding evaluation framework with performance-capped randomized tests

Open SourceFree

About

Research framework for building coding benchmarks that prevent agents from achieving perfect scores through memorization or shortcuts. Uses randomized test cases with performance caps to ensure evaluation scores reflect genuine problem-solving ability rather than dataset exploitation. Designed for rigorous assessment of code generation models and autonomous coding agents.

Details

Type
Integrations
Language

Tags

evaluationcoding-agentopen-sourceframeworkautonomous