DeepYardDeepYard
E

EpiBench

Verifiable benchmark for testing AI agents on epigenomics analysis workflows

Open SourceFree

About

EpiBench is a research benchmark designed to evaluate AI agents' ability to make analysis decisions in epigenomics workflows. Unlike typical benchmarks, it provides deterministic grading by testing agents on well-defined workflow states from realistic biological data analysis scenarios. Particularly useful for researchers building scientific AI agents that need to navigate complex bioinformatics pipelines.

Details

Type
Integrations
Language

Tags

evaluationopen-sourceautonomousframework