DeepYardDeepYard
B

Braintrust

AI evaluation and experiment tracking platform for production LLM apps

commercialfreemium

About

Braintrust is a developer-first evaluation and experiment tracking platform built for LLM applications. It lets teams define scoring functions, run evaluations against golden datasets, compare prompt and model variants side-by-side, and track quality metrics over time. The platform integrates directly into CI/CD pipelines so regressions are caught before they reach production.

Details

Typeevaluation
Integrationsopenai, anthropic, langchain, litellm
Languagepython, typescript

Tags

evaluationexperiment-trackingllm-opstestingci-cd