TML-Bench
Benchmark for evaluating data science agents on Kaggle-style tabular ML tasks
About
TML-Bench is an evaluation benchmark specifically designed to test autonomous coding agents on tabular machine learning tasks. It simulates Kaggle-style competitions with varying time budgets, assessing agents' ability to handle end-to-end data science workflows including data preprocessing, feature engineering, model selection, and hyperparameter tuning. The benchmark measures both correctness and reliability under realistic resource constraints, providing standardized metrics for comparing agent performance on practical ML tasks.
Details
| Type | |
| Integrations | |
| Language |
Tags
Quick Info
- Organization
- Independent
- Pricing
- open-source
- Free Tier
- Yes
- Updated
- Mar 9, 2026
Also in Dev Tools
Crawl4AI
Open-source web crawler optimized for LLMs and AI agents — 62K+ stars
Firecrawl
Web scraping API built for LLMs — turn any website into LLM-ready data — 89K+ stars
Headroom Context Optimization
Reduce LLM API costs by 50-90% through advanced context compression