DeepYardDeepYard
A

AgentDS

Benchmark framework measuring AI agent performance vs human experts on data science tasks

unknownFree

About

AgentDS is an academic research framework for evaluating AI agent capabilities in domain-specific data science workflows. It provides standardized benchmarks and metrics to assess agent performance against human experts, focusing on human-AI collaboration effectiveness. Designed for researchers studying autonomous agents in data analysis, modeling, and interpretation tasks.

Details

Type
Integrations
Language

Tags

evaluationautonomousopen-sourceframeworkpython