S
SlopCodeBench
Benchmark for measuring coding agent performance degradation across iterative tasks
Open SourceFree
About
Language-agnostic evaluation benchmark designed to measure how AI coding agents maintain code quality over extended, iterative development sessions. Features 20 real-world programming problems with 93 checkpoints to track code quality evolution and detect performance degradation patterns. Useful for researchers and developers building autonomous coding agents who need to assess long-horizon task performance beyond single-shot code generation.
Details
| Type | |
| Integrations | |
| Language |
Tags
evaluationcoding-agentopen-sourceautonomousframework
Quick Info
- Organization
- Research Collaboration
- Pricing
- open-source
- Free Tier
- Yes
- Updated
- Mar 27, 2026
Also in Dev Tools
C
Crawl4AI
Open-source web crawler optimized for LLMs and AI agents — 62K+ stars
OSSFree
unclecode
63.1Ktoday72
F
Firecrawl
Web scraping API built for LLMs — turn any website into LLM-ready data — 89K+ stars
OSSfreemium
Mendable
102.1Ktoday138
H
Headroom Context Optimization
Reduce LLM API costs by 50-90% through advanced context compression
OSSFree
Shubham Saboo
104.2Ktoday74